Abstract
Data transformation is a crucial step in data integration. While some transformations, such as liters to gallons, can be easily performed by applying a formula or a program on the input values, others, such as zip code to city, require sifting through a repository containing explicit value mappings. There are already powerful systems that provide formulae and algorithms for transformations. However, the automated identification of reference datasets to support value mapping remains largely unresolved. The Web is home to millions of tables with many containing explicit value mappings. This is in addition to value mappings hidden behind Web forms. In this paper, we present DataXFormer, a transformation engine that leverages Web tables and Web forms to perform transformation tasks. In particular, we describe an inductive, filter-refine approach for identifying explicit transformations in a corpus of Web tables and an approach to dynamically retrieve and wrap Web forms. Experiments show that the combination of both resource types covers more than 80% of transformation queries formulated by real-world users.
Original language | English |
---|---|
Publication status | Published - 2015 |
Event | 7th Biennial Conference on Innovative Data Systems Research, CIDR 2015 - Asilomar, United States Duration: 4 Jan 2015 → 7 Jan 2015 |
Conference
Conference | 7th Biennial Conference on Innovative Data Systems Research, CIDR 2015 |
---|---|
Country/Territory | United States |
City | Asilomar |
Period | 4/01/15 → 7/01/15 |