TY - GEN
T1 - Duplicate elimination in space-partitioning tree indexes
AU - Eltabakh, M. Y.
AU - Ouzzani, Mourad
AU - Aref, Walid G.
PY - 2007
Y1 - 2007
N2 - Space-partitioning trees, like the disk-based trie, quadtree, kd-tree and their variants, are a family of access methods that index multi-dimensional objects. In the case of indexing non-zero extent objects, e.g., line segments and rectangles, space-partitioning trees may replicate objects over multiple space partitions, e.g., PMR quadtree, expanded MX-CIF quadtree, and extended kd-tree. As a result, the answer to a query over these indexes may include duplicates that need to be eliminated, i.e., the same object may be reported more than once. In this paper, we propose generic duplicate elimination techniques for the class of space-partitioning trees in the context of SP-GiST; an extensible indexing framework for realizing space-partitioning trees. The proposed techniques are embedded inside the INDEX-SCAN operator. Therefore, duplicate copies of the same object do not propagate in the query plan, and the elimination process is transparent to the end-users. Two cases for the index structures are considered based on whether or not the objects' coordinates are stored inside the index tree. The theoretical and experimental analysis illustrate that the proposed techniques achieve savings in the storage requirements, I/O operations, and processing time when compared to adding a separate duplicate elimination operator in the query plan.
AB - Space-partitioning trees, like the disk-based trie, quadtree, kd-tree and their variants, are a family of access methods that index multi-dimensional objects. In the case of indexing non-zero extent objects, e.g., line segments and rectangles, space-partitioning trees may replicate objects over multiple space partitions, e.g., PMR quadtree, expanded MX-CIF quadtree, and extended kd-tree. As a result, the answer to a query over these indexes may include duplicates that need to be eliminated, i.e., the same object may be reported more than once. In this paper, we propose generic duplicate elimination techniques for the class of space-partitioning trees in the context of SP-GiST; an extensible indexing framework for realizing space-partitioning trees. The proposed techniques are embedded inside the INDEX-SCAN operator. Therefore, duplicate copies of the same object do not propagate in the query plan, and the elimination process is transparent to the end-users. Two cases for the index structures are considered based on whether or not the objects' coordinates are stored inside the index tree. The theoretical and experimental analysis illustrate that the proposed techniques achieve savings in the storage requirements, I/O operations, and processing time when compared to adding a separate duplicate elimination operator in the query plan.
UR - http://www.scopus.com/inward/record.url?scp=46649121240&partnerID=8YFLogxK
U2 - 10.1109/SSDBM.2007.10
DO - 10.1109/SSDBM.2007.10
M3 - Conference contribution
AN - SCOPUS:46649121240
SN - 0769528686
SN - 9780769528687
T3 - Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM
BT - 19th International Conference on Scientific and Statistical Database Management, SSDBM 2007
T2 - 19th International Conference on Scientific and Statistical Database Management, SSDBM 2007
Y2 - 9 July 2007 through 11 July 2007
ER -