Computing hierarchical summary of the data streams

Zubair Shah*, Abdun Naser Mahmood, Michael Barlow

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

Data stream processing is an important function in many online applications such as network traffic analysis, web applications, and financial data analysis. Computing summaries of data stream is challenging since streaming data is generally unbounded, and cannot be permanently stored or accessed more than once. In this paper, we have proposed two counter based hierarchical (CHS) ε-approximation algorithms to create hierarchical summaries of one dimensional data. CHS maintains a data structure, where each entry contains the incoming data item and an associated counter to store its frequency. Since every item in streaming data cannot be stored, CHS only maintains frequent items (known as hierarchical heavy hitters) at various levels of generalization hierarchy by exploiting the natural hierarchy of the data. The algorithm guarantees accuracy of count within an ε bound. Furthermore, using aperiodic (CHS-A) and periodic (CHS-P) compression strategy the proposed technique offers improved space complexities of O(η/ε) and O(η/ε log εN), respectively. We provide theoretical proofs for both space and time requirements of CHS algorithm. We have also experimentally compared the proposed algorithm with the existing benchmark techniques. Experimental results show that the proposed algorithm requires fewer updates per element of data, and uses a moderate amount of bounded memory. Moreover, precision-recall analysis demonstrates that CHS algorithm provides a high quality output compared to existing benchmark techniques. For the experimental validation, we have used both synthetic data derived from an open source generator, and real benchmark data sets from an international Internet Service Provider.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 20th Pacific-Asia Conference, PAKDD 2016, Proceedings
EditorsJames Bailey, Latifur Khan, Takashi Washio, Gillian Dobbie, Joshua Zhexue Huang, Ruili Wang
PublisherSpringer Verlag
Pages168-179
Number of pages12
ISBN (Print)9783319317496
DOIs
Publication statusPublished - 2016
Externally publishedYes
Event20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2016 - Auckland, New Zealand
Duration: 19 Apr 201622 Apr 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9652 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2016
Country/TerritoryNew Zealand
CityAuckland
Period19/04/1622/04/16

Keywords

  • Data streams
  • Heavy hitters
  • Hierarchical heavy hitters
  • Network traffic summarization

Fingerprint

Dive into the research topics of 'Computing hierarchical summary of the data streams'. Together they form a unique fingerprint.

Cite this