Web data extraction using hybrid program synthesis: A combination of top-down and bottom-up inference: A Combination of Top-down and Bottom-up Inference

Mohammad Raza, Sumit Gulwani

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

18 Citations (Scopus)

Abstract

Automatic synthesis of web data extraction programs has been explored in a variety of settings, but in practice there remain various robustness and usability challenges. In this work we present a novel program synthesis approach which combines the benefits of deductive and enumerative synthesis strategies, yielding a semi-supervised technique with which concise programs expressible in standard languages can be synthesized from very few examples. We demonstrate improvement over existing techniques in terms of overall accuracy, number of examples required, and program complexity. Our method has been deployed as a web extraction feature in the mass market Microsoft Power BI product.
Original languageEnglish
Title of host publicationSIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages1967-1978
Number of pages12
ISBN (Electronic)9781450367356
DOIs
Publication statusPublished - 31 May 2020
Externally publishedYes
Event2020 ACM SIGMOD International Conference on Management of Data, SIGMOD 2020 - Portland, United States
Duration: 14 Jun 202019 Jun 2020

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2020 ACM SIGMOD International Conference on Management of Data, SIGMOD 2020
Country/TerritoryUnited States
CityPortland
Period14/06/2019/06/20

Fingerprint

Dive into the research topics of 'Web data extraction using hybrid program synthesis: A combination of top-down and bottom-up inference: A Combination of Top-down and Bottom-up Inference'. Together they form a unique fingerprint.

Cite this