DataXFormer: Leveraging the web for semantic transformations

Ziawasch Abedjan, John Morcos, Michael Gubanov, Ihab F. Ilyas, Michael Stonebraker, Paolo Papotti, Mourad Ouzzani

Research output: Contribution to conferencePaperpeer-review

21 Citations (Scopus)

Abstract

Data transformation is a crucial step in data integration. While some transformations, such as liters to gallons, can be easily performed by applying a formula or a program on the input values, others, such as zip code to city, require sifting through a repository containing explicit value mappings. There are already powerful systems that provide formulae and algorithms for transformations. However, the automated identification of reference datasets to support value mapping remains largely unresolved. The Web is home to millions of tables with many containing explicit value mappings. This is in addition to value mappings hidden behind Web forms. In this paper, we present DataXFormer, a transformation engine that leverages Web tables and Web forms to perform transformation tasks. In particular, we describe an inductive, filter-refine approach for identifying explicit transformations in a corpus of Web tables and an approach to dynamically retrieve and wrap Web forms. Experiments show that the combination of both resource types covers more than 80% of transformation queries formulated by real-world users.

Original languageEnglish
Publication statusPublished - 2015
Event7th Biennial Conference on Innovative Data Systems Research, CIDR 2015 - Asilomar, United States
Duration: 4 Jan 20157 Jan 2015

Conference

Conference7th Biennial Conference on Innovative Data Systems Research, CIDR 2015
Country/TerritoryUnited States
CityAsilomar
Period4/01/157/01/15

Fingerprint

Dive into the research topics of 'DataXFormer: Leveraging the web for semantic transformations'. Together they form a unique fingerprint.

Cite this