The similarity-aware relational intersect database operator

Wadha J. Al Marri*, Qutaibah Malluhi, Mourad Ouzzani, Mingjie Tang, Walid G. Aref

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

Identifying similarities in large datasets is an essential operation in many applications such as bioinformatics, pattern recognition, and data integration. To make the underlying database system similarity-aware, the core relational operators have to be extended. Several similarity-aware relational operators have been proposed that introduce similarity processing at the database engine level, e.g., similarity joins and similarity group-by. This paper extends the semantics of the set intersection operator to operate over similar values. The paper describes the semantics of the similarity-based set intersection operator, and develops an efficient query processing algorithm for evaluating it. The proposed operator is implemented inside an open-source database system, namely PostgreSQL. Several queries from the TPC-H benchmark are extended to include similarity-based set intersetion predicates. Performance results demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.

Original languageEnglish
Title of host publicationSimilarity Search and Applications - 7th International Conference, SISAP 2014, Proceedings
EditorsAgma Juci Machado Traina, Caetano Traina, Robson Leonardo Ferreira Cordeiro
PublisherSpringer Verlag
Pages164-175
Number of pages12
ISBN (Electronic)9783319119878
DOIs
Publication statusPublished - 2014
Event7th International Conference on Similarity Search and Applications, SISAP 2014 - Los Cabos, Mexico
Duration: 29 Oct 201431 Oct 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8821
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference7th International Conference on Similarity Search and Applications, SISAP 2014
Country/TerritoryMexico
CityLos Cabos
Period29/10/1431/10/14

Fingerprint

Dive into the research topics of 'The similarity-aware relational intersect database operator'. Together they form a unique fingerprint.

Cite this