TY - JOUR
T1 - Computing hierarchical summary from two-dimensional big data streams
AU - Shah, Zubair
AU - Mahmood, Abdun Naser
AU - Barlow, Michael
AU - Tari, Zahir
AU - Yi, Xun
AU - Zomaya, Albert Y.
N1 - Publisher Copyright:
© 1990-2012 IEEE.
PY - 2018/4/1
Y1 - 2018/4/1
N2 - There are many application domains, where hierarchical data is inherent, but surprisingly, there are few techniques for mining patterns from such important data. Hierarchical Heavy Hitters (HHH) and multilevel and Cross-Level Association Rules (CLAR) mining are well-known hierarchical pattern mining techniques. The problem in these techniques; however, is that they focus on capturing only global patterns from data but cannot identify local contextual patterns. Another problem in these techniques is that they treat all data items in the transaction equally and do not consider the sequential nature of the relationship among items within a transaction; hence, they cannot capture the correlation semantic within the transactions of the data items. There are many applications such as clickstream mining, healthcare data mining, network monitoring, and recommender systems, which require to identify local contextual patterns and correlation semantics. In this work, we introduce a new concept, which can capture the sequential nature of the relationship between pairs of hierarchical items at multiple concept levels and can capture local contextual patterns within the context of the global patterns. We call this notion Hierarchically Correlated Heavy Hitters (HCHH). Specifically, the proposed approach finds the correlation between items corresponding to hierarchically discounted frequency counts. We have provided formal definitions of the proposed concept and developed algorithmic approaches for computing HCHH in data streams efficiently. The proposed HCHH algorithm have deterministic error guarantees, and space bounds. It requires epsilon memory, where is a small constant, and [0,1], [0,1] are user defined parameters on upper bounds of estimation error. We have compared the proposed HCHH concept with existing hierarchical pattern mining approaches both theoretically as well as experimentally.
AB - There are many application domains, where hierarchical data is inherent, but surprisingly, there are few techniques for mining patterns from such important data. Hierarchical Heavy Hitters (HHH) and multilevel and Cross-Level Association Rules (CLAR) mining are well-known hierarchical pattern mining techniques. The problem in these techniques; however, is that they focus on capturing only global patterns from data but cannot identify local contextual patterns. Another problem in these techniques is that they treat all data items in the transaction equally and do not consider the sequential nature of the relationship among items within a transaction; hence, they cannot capture the correlation semantic within the transactions of the data items. There are many applications such as clickstream mining, healthcare data mining, network monitoring, and recommender systems, which require to identify local contextual patterns and correlation semantics. In this work, we introduce a new concept, which can capture the sequential nature of the relationship between pairs of hierarchical items at multiple concept levels and can capture local contextual patterns within the context of the global patterns. We call this notion Hierarchically Correlated Heavy Hitters (HCHH). Specifically, the proposed approach finds the correlation between items corresponding to hierarchically discounted frequency counts. We have provided formal definitions of the proposed concept and developed algorithmic approaches for computing HCHH in data streams efficiently. The proposed HCHH algorithm have deterministic error guarantees, and space bounds. It requires epsilon memory, where is a small constant, and [0,1], [0,1] are user defined parameters on upper bounds of estimation error. We have compared the proposed HCHH concept with existing hierarchical pattern mining approaches both theoretically as well as experimentally.
KW - Hierarchical patterns
KW - association rules
KW - frequent patterns
UR - http://www.scopus.com/inward/record.url?scp=85037620825&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2017.2778734
DO - 10.1109/TPDS.2017.2778734
M3 - Article
AN - SCOPUS:85037620825
SN - 1045-9219
VL - 29
SP - 803
EP - 818
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 4
ER -