Detecting inconsistencies in private data with secure function evaluation

Nilothpal Talukder, Mourad Ouzzani, Ahmed Khalifa Elmagarmid, Mohamed Yakout

Research output: Book/ReportCommissioned reportpeer-review

Abstract

Erroneous and inconsistent data, often referred to as ‘dirty data’, is a major worry for businesses. Prevalent techniques to improve data quality consist of discovering data quality rules, identifying records that violate those rules, and then modifying the data to either remove those violations. Most of the work described in the literature deals with cases where both the data and the rules are visible to the party that is in charge of cleaning the data. However, consider the case where two parties with data and data quality rules wish to cooperate in data cleaning under two restrictions: (1) neither of the parties is willing to share their data due to its sensitive nature, and (2) the data quality rules may reveal information about the content of the data and may be considered as a private asset to the business. The question then is how to clean the data without having to share the data or the rules. While the data cleaning process involves several phases, our focus in this paper is on detecting inconsistent data. We propose a novel inconsistency detection protocol that preserves the privacy of both the data and the data quality rules without the use of a third party. Inconsistent data is defined as all records in a database that violate some conditional functional dependencies or CFDs. Our approach is based primarily on the secure multiparty computation framework. We present complexity analysis of our protocol and a series of experiments about its performance.
Original languageEnglish
Publication statusPublished - 2011
Externally publishedYes

Fingerprint

Dive into the research topics of 'Detecting inconsistencies in private data with secure function evaluation'. Together they form a unique fingerprint.

Cite this