TY - GEN
T1 - Web data extraction using hybrid program synthesis: A combination of top-down and bottom-up inference
T2 - 2020 ACM SIGMOD International Conference on Management of Data, SIGMOD 2020
AU - Raza, Mohammad
AU - Gulwani, Sumit
N1 - Publisher Copyright:
© 2020 Association for Computing Machinery.
PY - 2020/5/31
Y1 - 2020/5/31
N2 - Automatic synthesis of web data extraction programs has been explored in a variety of settings, but in practice there remain various robustness and usability challenges. In this work we present a novel program synthesis approach which combines the benefits of deductive and enumerative synthesis strategies, yielding a semi-supervised technique with which concise programs expressible in standard languages can be synthesized from very few examples. We demonstrate improvement over existing techniques in terms of overall accuracy, number of examples required, and program complexity. Our method has been deployed as a web extraction feature in the mass market Microsoft Power BI product.
AB - Automatic synthesis of web data extraction programs has been explored in a variety of settings, but in practice there remain various robustness and usability challenges. In this work we present a novel program synthesis approach which combines the benefits of deductive and enumerative synthesis strategies, yielding a semi-supervised technique with which concise programs expressible in standard languages can be synthesized from very few examples. We demonstrate improvement over existing techniques in terms of overall accuracy, number of examples required, and program complexity. Our method has been deployed as a web extraction feature in the mass market Microsoft Power BI product.
UR - http://www.scopus.com/inward/record.url?scp=85086271456&partnerID=8YFLogxK
U2 - 10.1145/3318464.3380608
DO - 10.1145/3318464.3380608
M3 - Conference contribution
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 1967
EP - 1978
BT - SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
PB - Association for Computing Machinery
Y2 - 14 June 2020 through 19 June 2020
ER -