TY - GEN
T1 - Geometrically inspired itemset mining
AU - Verhein, Florian
AU - Chawla, Sanjay
PY - 2006
Y1 - 2006
N2 - In our geometric view, an itemset is a vector (Itemvector) in the space of transactions. Linear and potentially non-linear transformations can be applied to the itemvectors before mining patterns. Aggregation functions and interestingness measures can be applied to the transformed vectors and pushed inside the mining process. We show that interesting itemset mining can be carried out by instantiating four abstract functions: a transformation (g), an algebraic aggregation operator (o) and measures (f and F). For Frequent Itemset Mining (FIM), g and F are identity transformations, o is intersection and f is the cardinality. Based on this geometric view we present a novel algorithm that uses space linear in the number of 1-itemsets to mine all interesting itemsets in a single pass over the data, with no candidate generation. It scales (roughly) linearly in running time with the number of interesting itemsets. FIM experiments show that it outperforms FPGrowth on realistic datasets above a small support threshold (0.29% and 1.2% in our experiments)1.
AB - In our geometric view, an itemset is a vector (Itemvector) in the space of transactions. Linear and potentially non-linear transformations can be applied to the itemvectors before mining patterns. Aggregation functions and interestingness measures can be applied to the transformed vectors and pushed inside the mining process. We show that interesting itemset mining can be carried out by instantiating four abstract functions: a transformation (g), an algebraic aggregation operator (o) and measures (f and F). For Frequent Itemset Mining (FIM), g and F are identity transformations, o is intersection and f is the cardinality. Based on this geometric view we present a novel algorithm that uses space linear in the number of 1-itemsets to mine all interesting itemsets in a single pass over the data, with no candidate generation. It scales (roughly) linearly in running time with the number of interesting itemsets. FIM experiments show that it outperforms FPGrowth on realistic datasets above a small support threshold (0.29% and 1.2% in our experiments)1.
UR - http://www.scopus.com/inward/record.url?scp=49549103257&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2006.75
DO - 10.1109/ICDM.2006.75
M3 - Conference contribution
AN - SCOPUS:49549103257
SN - 0769527019
SN - 9780769527017
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 655
EP - 666
BT - Proceedings - Sixth International Conference on Data Mining, ICDM 2006
T2 - 6th International Conference on Data Mining, ICDM 2006
Y2 - 18 December 2006 through 22 December 2006
ER -