NADEEF: A generalized data cleaning system

Amr Ebaid*, Ahmed Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, Jorge Arnulfo Quiane-Ruiz, Nan Tang, Si Yin

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

38 Citations (Scopus)

Abstract

We present NADEEF, an extensible, generic and easy-to-deploy data cleaning system. NADEEF distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface allows users to specify data quality rules by writing code that implements predefined classes. These classes uniformly define what is wrong with the data and (possibly) how to fix it. We will demonstrate the following features provided by NADEEF. (1) Heterogeneity: The programming interface can be used to express many types of data quality rules beyond the well known CFDs (FDs), MDs and ETL rules. (2) Interdependency: The core algorithms can interleave multiple types of rules to detect and repair data errors. (3) Deployment and extensibility: Users can easily customize NADEEF by defining new types of rules, or by extending the core. (4) Metadata management and data custodians: We show a live data quality dashboard to effectively involve users in the data cleaning process.

Original languageEnglish
Pages (from-to)1218-1221
Number of pages4
JournalProceedings of the VLDB Endowment
Volume6
Issue number12
DOIs
Publication statusPublished - Aug 2013
Event39th International Conference on Very Large Data Bases, VLDB 2012 - Trento, Italy
Duration: 26 Aug 201330 Aug 2013

Fingerprint

Dive into the research topics of 'NADEEF: A generalized data cleaning system'. Together they form a unique fingerprint.

Cite this