TY - GEN
T1 - Kernel-Based skyline cardinality estimation
AU - Zhang, Zhenjie
AU - Yang, Yin
AU - Cai, Ruichu
AU - Papadias, Dimitris
AU - Tung, Anthony
PY - 2009
Y1 - 2009
N2 - The skyline of a d-dimensional dataset consists of all points not dominated by others. The incorporation of the skyline operator into practical database systems necessitates an efficient and effective cardinality estimation module. However, existing theoretical work on this problem is limited to the case where all d dimensions are independent of each other, which rarely holds for real datasets. The state of the art Log Sampling (LS) technique simply applies theoretical results for independent dimensions to non-independent data anyway, sometimes leading to large estimation errors. To solve this problem, we propose a novel Kernel-Based (KB) approach that approximates the skyline cardinality with nonparametric methods. Extensive experiments with various real datasets demonstrate that KB achieves high accuracy, even in cases where LS fails. At the same time, despite its numerical nature, the efficiency of KB is comparable to that of LS. Furthermore, we extend both LS and KB to the k-dominant skyline, which is commonly used instead of the conventional skyline for high-dimensional data.
AB - The skyline of a d-dimensional dataset consists of all points not dominated by others. The incorporation of the skyline operator into practical database systems necessitates an efficient and effective cardinality estimation module. However, existing theoretical work on this problem is limited to the case where all d dimensions are independent of each other, which rarely holds for real datasets. The state of the art Log Sampling (LS) technique simply applies theoretical results for independent dimensions to non-independent data anyway, sometimes leading to large estimation errors. To solve this problem, we propose a novel Kernel-Based (KB) approach that approximates the skyline cardinality with nonparametric methods. Extensive experiments with various real datasets demonstrate that KB achieves high accuracy, even in cases where LS fails. At the same time, despite its numerical nature, the efficiency of KB is comparable to that of LS. Furthermore, we extend both LS and KB to the k-dominant skyline, which is commonly used instead of the conventional skyline for high-dimensional data.
KW - Cardinality estimation
KW - Kernel
KW - Non-parametric methods
KW - Skyline
UR - http://www.scopus.com/inward/record.url?scp=70849094969&partnerID=8YFLogxK
U2 - 10.1145/1559845.1559899
DO - 10.1145/1559845.1559899
M3 - Conference contribution
AN - SCOPUS:70849094969
SN - 9781605585543
T3 - SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems
SP - 509
EP - 521
BT - SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems
T2 - International Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09
Y2 - 29 June 2009 through 2 July 2009
ER -