TY - GEN
T1 - The similarity-aware relational intersect database operator
AU - Al Marri, Wadha J.
AU - Malluhi, Qutaibah
AU - Ouzzani, Mourad
AU - Tang, Mingjie
AU - Aref, Walid G.
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2014.
PY - 2014
Y1 - 2014
N2 - Identifying similarities in large datasets is an essential operation in many applications such as bioinformatics, pattern recognition, and data integration. To make the underlying database system similarity-aware, the core relational operators have to be extended. Several similarity-aware relational operators have been proposed that introduce similarity processing at the database engine level, e.g., similarity joins and similarity group-by. This paper extends the semantics of the set intersection operator to operate over similar values. The paper describes the semantics of the similarity-based set intersection operator, and develops an efficient query processing algorithm for evaluating it. The proposed operator is implemented inside an open-source database system, namely PostgreSQL. Several queries from the TPC-H benchmark are extended to include similarity-based set intersetion predicates. Performance results demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.
AB - Identifying similarities in large datasets is an essential operation in many applications such as bioinformatics, pattern recognition, and data integration. To make the underlying database system similarity-aware, the core relational operators have to be extended. Several similarity-aware relational operators have been proposed that introduce similarity processing at the database engine level, e.g., similarity joins and similarity group-by. This paper extends the semantics of the set intersection operator to operate over similar values. The paper describes the semantics of the similarity-based set intersection operator, and develops an efficient query processing algorithm for evaluating it. The proposed operator is implemented inside an open-source database system, namely PostgreSQL. Several queries from the TPC-H benchmark are extended to include similarity-based set intersetion predicates. Performance results demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.
UR - http://www.scopus.com/inward/record.url?scp=84911019946&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-11988-5_15
DO - 10.1007/978-3-319-11988-5_15
M3 - Conference contribution
AN - SCOPUS:84911019946
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 164
EP - 175
BT - Similarity Search and Applications - 7th International Conference, SISAP 2014, Proceedings
A2 - Traina, Agma Juci Machado
A2 - Traina, Caetano
A2 - Cordeiro, Robson Leonardo Ferreira
PB - Springer Verlag
T2 - 7th International Conference on Similarity Search and Applications, SISAP 2014
Y2 - 29 October 2014 through 31 October 2014
ER -