Machine Learning Solution Breaks Human Review Bottleneck and Delivers Highly Accurate, Scalable Text Extraction from PDFs, Scans and Images
MOUNTAIN VIEW, CA (April 17, 2018) – WattzOn announced today that its flagship machine learning product, Mr Bill, has surpassed human level performance (also known as superhuman accuracy) for automated text extraction from structured documents such as invoices, forms, lab reports and medical records. A recent WattzOn study on real-world documents shows F1 score performance of 98% for fully supported fields, reflecting not only a high precision rate of 99%, but also a high recall rate of 98%. Additionally, Mr Bill trains on a small number of example documents, approaching one shot learning levels. Hosted on a proprietary fast and elastic AI workbench, Mr Bill is a complete and scalable machine learning system for data extraction from structured documents.
For data in PDF form, Mr Bill achieves a measured F1 score of 98% accuracy. The product achieves an estimated F1 score of 96% for the the same documents converted to image form. The F1 score is a balanced measure of precision and recall, and is a metric useful for comparison against current real world data extraction systems. Manual systems typically have at least 10% error rates before a final human review and correction cycle.
Mr Bill is a complete advanced machine learning system for highly scalable automated text extraction, supported by WattzOn’s Elastic AI Workbench, which includes pipelining, orchestration, job control, automatic hyperparameter tuning, and provision for an elastic API. Mr Bill’s key features are not only its beyond human accuracy and recall rates, but also the fact that it requires uniquely low training costs. With only 20 examples per text field needed for training purposes, the product approaches the goal of one shot learning. The low training requirements enable economical support for the natural variation of documents.
The accuracy study performed by the WattzOn team first measured the F1 score across four different document forms and layouts in PDF format. For each set, 20 – 30 example documents were used in training, and 80 – 100 documents were held out unseen for validation. Using a modest elastic computing cluster in a separate speed test, field data extraction processing speed measured about 7 seconds per field. This study demonstrated that Mr Bill’s classification capabilities beat the performance of current real world systems for all document formats.
As organizations go through digital transformations, Mr Bill provides important business benefits:
- Liberating data trapped in structured documents
- Providing high-value semi-structured data for use within other machine learning and NLP algorithms
- Removing the choke point of processing speed for high-quality text extraction human data entry teams
Mr Bill, with its Elastic AI Workbench, enables text data extraction at the recall, precision, speed and scale to meet modern enterprise needs, reducing operating costs and expanding product and market opportunities. With very low training costs, Mr Bill’s ROI is measured in days and weeks, not months and years.
Mr Bill is immediately available. Please contact WattzOn for a customized product demonstration.
About Mr Bill
Mr Bill uses an innovative, complex application of machine learning algorithms to quickly and accurately extract data. Mr Bill is supported by an Elastic AI Workbench, an infrastructure that manages, trains, tunes, and extracts data in an elastic computing cloud, and provides a strong foundation for other AI initiatives. Mr Bill is available through a SaaS software license to enterprise customers in the healthcare, energy, defense, government and finance sectors.
WattzOn provides text extraction data services through its complete machine learning system, MR BILL, and a vertical API for utility data, LINK, that covers 50 states and 94 million homes. WattzOn is a women-led company, with Martha Amram, CEO, and Sandra Carrico, VP of Engineering and Chief Data Scientist.
With performance that surpasses that of humans, and uncommonly low set up costs, MR BILL is a natural fit for markets with fragmented data sources and high data volumes, providing valuable semi-structured data for use in AI powered analytics and digital enterprise automation. WattzOn’s LINK product serves market leaders in the solar, smart home and commercial utility bill processing markets, with expansion into consumer credit. Both products are available via SaaS software license.
Sources for Performance Comparison:
- Human Data Entry, Two Passes: Customer interviews.
- Software Systems: See https://ocrsolutions.com/typical-field-acceptance-rate-ocr-accuracy-level/ (F1 score calculated with 93% precision and 90% recall). https://blogs.dropbox.com/tech/2017/04/creating-a-modern-ocr-pipeline-using-computer-vision-and-deep-learning (F1 score calculated with 87% precision and 95% recall).
- System of Software And Human Review: Customer interviews