Computing discounted multidimensional hierarchical aggregates using modified Misra Gries algorithm

Zubair Shah*, Abdun Naser Mahmood, Michael Barlow

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

Abstract Finding the "Top k" list or heavy hitters is an important function in many computing applications, including database joins, data warehousing (e.g., OLAP), web caching and hits, network usage monitoring, and detecting DDoS attacks. While most applications work on traditional "flat" data, many domains contain data that has attributes which can take values from different hierarchies e.g., time and geographic location, source and destination IP addresses, etc. In addition, hierarchical heavy hitters generated from hierarchical attributes offer insightful and a generalized view of the data. When data arrives in a stream, infrequent data elements need to be deleted due to space constraints. However, unlike traditional heavy hitters which do not consider deleted elements, in a hierarchical heavy hitter some of these deleted elements could form a heavy hitter at a higher level of the hierarchy. Furthermore, unlike traditional heavy hitters, hierarchical heavy hitters have ancestor-descendant relationship which requires the count of descendant heavy hitters to be discounted at the higher levels of the hierarchy. This is particularly challenging in a streaming environment because all the incoming data items cannot be stored or revisited. This problem is generally addressed by accepting an error constraint ∈ on the precision of the count of hierarchical heavy hitters, since accurate count cannot be guaranteed under the constrained memory requirement posed by the streaming environment. In this work, we propose a new streaming ∈-approximation algorithm (HHH-MG) for computing Hierarchical Heavy Hitters based on a modified Misra Gries heavy hitter algorithm. The proposed algorithm guarantees ∈-approximation precision with improved worst-case time and space bounds compared to previous algorithms. It requires overall O(Formula presented.) space, and O(η) updates per element of the data, where η is a small constant. We provide theoretical proofs for the space and time requirements. We have also experimentally compared the proposed algorithm with benchmark techniques. Experimental results demonstrate that the proposed algorithm requires fewer updates per element of data, and on average requires less memory. For the experimental validation, we have used both synthetic data derived from an open source generator, and real benchmark datasets from an international Internet Service Provider.

Original languageEnglish
Article number1826
Pages (from-to)170-186
Number of pages17
JournalPerformance Evaluation
Volume91
DOIs
Publication statusPublished - 1 Sept 2015
Externally publishedYes

Keywords

  • Data streams
  • Heavy hitters
  • Hierarchical heavy hitters
  • Network traffic summarization

Fingerprint

Dive into the research topics of 'Computing discounted multidimensional hierarchical aggregates using modified Misra Gries algorithm'. Together they form a unique fingerprint.

Cite this