Aggregate query answering on possibilistic data with cardinality constraints

Graham Cormode*, Divesh Srivastava, Entong Shen, Ting Yu

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

10 Citations (Scopus)

Abstract

Uncertainties in data can arise for a number of reasons: when data is incomplete, contains conflicting information or has been deliberately perturbed or coarsened to remove sensitive details. An important case which arises in many real applications is when the data describes a set of possibilities, but with cardinality constraints. These constraints represent correlations between tuples encoding, e.g. that at most two possible records are correct, or that there is an (unknown) one-to-one mapping between a set of tuples and attribute values. Although there has been much effort to handle uncertain data, current systems are not equipped to handle such correlations, beyond simple mutual exclusion and co-existence constraints. Vitally, they have little support for efficiently handling aggregate queries on such data. In this paper, we aim to address some of these deficiencies, by introducing LICM (Linear Integer Constraint Model), which can succinctly represent many types of tuple correlations, particularly a class of cardinality constraints. We motivate and explain the model with examples from data cleaning and masking sensitive data, to show that it enables modeling and querying such data, which was not previously possible. We develop an efficient strategy to answer conjunctive and aggregate queries on possibilistic data by describing how to implement relational operators over data in the model. LICM compactly integrates the encoding of correlations, query answering and lineage recording. In combination with off-the-shelf linear integer programming solvers, our approach provides exact bounds for aggregate queries. Our prototype implementation demonstrates that query answering with LICM can be effective and scalable.

Original languageEnglish
Article number6228089
Pages (from-to)258-269
Number of pages12
JournalProceedings - International Conference on Data Engineering
DOIs
Publication statusPublished - 2012
Externally publishedYes
EventIEEE 28th International Conference on Data Engineering, ICDE 2012 - Arlington, VA, United States
Duration: 1 Apr 20125 Apr 2012

Fingerprint

Dive into the research topics of 'Aggregate query answering on possibilistic data with cardinality constraints'. Together they form a unique fingerprint.

Cite this