Building data civilizer pipelines with an advanced workflow engine

Essam Mansour, Dong Deng, Raul Castro Fernandez, Abdulhakim A. Qahtan, Wenbo Tao, Ziawasch Abedjan, Ahmed Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)

Abstract

In order for an enterprise to gain insight into its internal business and the changing outside environment, it is essential to provide the relevant data for in-depth analysis. Enterprise data is usually scattered across departments and geographic regions and is often inconsistent. Data scientists spend the majority of their time finding, preparing, integrating, and cleaning relevant data sets. Data Civilizer is an end-To-end data preparation system. In this paper, we present the complete system, focusing on our new workflow engine, a superior system for entity matching and consolidation, and new cleaning tools. Our workflow engine allows data scientists to author, execute and retrofit data preparation pipelines of different data discovery and cleaning services. Our end-To-end demo scenario is based on data from the MIT data warehouse and e-commerce data sets.

Original languageEnglish
Title of host publicationProceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1593-1596
Number of pages4
ISBN (Electronic)9781538655207
DOIs
Publication statusPublished - 24 Oct 2018
Event34th IEEE International Conference on Data Engineering, ICDE 2018 - Paris, France
Duration: 16 Apr 201819 Apr 2018

Publication series

NameProceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018

Conference

Conference34th IEEE International Conference on Data Engineering, ICDE 2018
Country/TerritoryFrance
CityParis
Period16/04/1819/04/18

Keywords

  • Data Cleaning
  • Data Discovery
  • Data Integration

Fingerprint

Dive into the research topics of 'Building data civilizer pipelines with an advanced workflow engine'. Together they form a unique fingerprint.

Cite this