Learn more about SNAP and GLYNT, WattzOn’s data extraction services for utility bills. Learn why we built it and how we knew what features to include. Interview by Shaul Stone (SS).
SS: Recently I had the opportunity to speak with David Nelson, Director of Product Management at WattzOn. We sat down to discuss machine learning, the SNAP products, and what it means to him, his company, and his customers.
SS: Hey, David. Thanks for sitting down with me. So, can you tell me a little bit about WattzOn’s SNAP? What is it, exactly?
DAVID: That is my favorite question these days. SNAP is a machine learning system that extracts data from utility bills. It’s a system we built in-house from the ground up. For commercial bills, which are often very complex, the customer selects which fields they want and a customized machine learning model is built just for their use. Because we can customize SNAP to any bill, it is ready to do energy, water, sewer and waste; any commodity with bills.
We have two products that run off of our machine learning system. SNAP, which provides the pre-trained models and GLYNT which provides the customized models. There is an additional data verification and review cycle in GLYNT that ensures the data is accurate to the 99.5% level. This is important because our customers need highly accurate utility bill data.
SS: What’s the key innovation?
DAVID: SNAP started with our energy customers at WattzOn, where we provide utility bill data to customers in the cleantech and smart home industries. There are a multitude of layouts on utility bills across the U.S. Those differences make it time consuming and expensive to extract data from the bills for customers. Software engineers are reduced to writing hand-coded solutions for each utility. So we set out to automate data extraction from documents. We needed to account for the layout variations and have an accuracy in the “golden-zone” of 98% or higher. That’s exactly what we built, but it was only the beginning…
SS: Only the beginning?
DAVID: Yes, we soon realized we had automated data extraction with high precisions, low error rates, and on minuscule training sizes. This is a really big deal. We could account for document variations and custom field requirements. What we built has value for other industries that require high accuracy and rely on semi-structured data trapped in faxes, pdfs and scanned documents. We serve these customers through GLYNT.AI. We serve the energy market with two products, SNAP and GLYNT.
SS: Can you explain the SNAP features in a bit more detail? How do they benefit your customers?
DAVID: Automated data extraction is faster, cheaper and more accurate than the alternatives. Before adding the verification and review cycle of GLYNT, SNAP has a very high accuracy rate, typically 98%. So the machine learning itself produces industry-leading results.
Our studies show that errors are costly to fix, 11X the cost of data extraction itself. Every machine learning system is challenged by edge cases, and when the SNAP system has a low confidence in a result (eg lower than 50% confident) it leaves the data field empty. So bad data is not perpetuated or hidden. The SNAP system points humans to these exceptions, so they can be easily fixed if needed. This is the cheapest way to get to the 99.5% accurate rate the energy markets need for invoice payments, energy benchmarking and so on. And every fix makes our system smarter and smarter. GLYNT includes this automated and human review.
SNAP also flips the script on the size of the document set needed to train the machine learning models. SNAP requires just seven documents to learn on. With such small training sets, you don’t need to be a Fortune 500 company to create high quality data extraction models. I like to say it cuts the big data players down to size.
SS: And the secret sauce is…..?
DAVID: SNAP uses Mixed Formal Learning to quickly tune machine learning models and overcome document variations. Legacy solutions like Zonal OCR or scripts of regular expressions to extract data from PDFs are brittle and require constant maintenance. Our system is flexible and easily encompasses new fields, new customer needs, and even new industries. It’s Ibuprofen for data solution headaches. That’s a big deal.
SS: Getting machine learning system off the ground must have had its fair share of challenges. What were some of the obstacles you overcame in the design and launch phases?
DAVID: Our biggest stumbling blocks were usability and quality assurance. These are common problems with machine learning solutions and tough to overcome.
SS: So, how did you get over the hump?
DAVID: First, we had to build the technology infrastructure. Our Elastic AI Workbench allows SNAP to easily add and orchestrate additional machine learning capabilities into our ecosystem. It’s self-contained and elastic, so we can serve small and high volume customers.
Second, we had to verify our results and build in the infrastructure to monitor our performance. This was a process of trial and error, with the goal of creating a highly transparent AI system.
SS: What are your customers saying about SNAP? And where do you go next?
DAVID: Customers often tell us glad that they didn’t have to build this solution themselves. Not to toot our own horn, but we often get feedback on how simple and scalable a solution SNAP is. It relieves major chokepoints in company workflows, and helps our customers scale.
So, the future of SNAP is clear: high performance data extraction from bills, with clean labeled data delivered by API. Our mission is to continue to make this is as easy as possible. We’re working hard on the user experience, so that our customers can do as much of the bill document management and data extraction workflow as they want.
We also pay attention to roles. We don’t want software engineers to do data detail, that is best left to the data analyst. We want to support good data security and governance, so we need an easy-to-integrate API that meets modern IT department requirements.
SS: David, thank you for taking the time to chat with me today!