A summarization paradigm for big data

Zubair Shah*, Abdun Naser Mahmood

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

We have developed an efficient summarization paradigm for data drawn from hierarchical domain to construct a succinct view of important large-valued regions ('heavy hitters'). It requires one pass over the data with moderate number of updates per element of the data and requires lesser amount of memory space as compared to existing approaches for approximating hierarchically discounted frequency counts of heavy hitters with provable guarantees. The proposed technique is generic that can make use of existing state-of-the-art sketch-based or count-based frequency estimation approaches. Any algorithm from both of these families can be coupled as a subroutine in the proposed framework without any substantial modifications. Experimental as well as theoretical justifications have been provided for its significance.

Original languageEnglish
Title of host publicationProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
EditorsJimmy Lin, Jian Pei, Xiaohua Tony Hu, Wo Chang, Raghunath Nambiar, Charu Aggarwal, Nick Cercone, Vasant Honavar, Jun Huan, Bamshad Mobasher, Saumyadipta Pyne
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages61-63
Number of pages3
ISBN (Electronic)9781479956654
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event2nd IEEE International Conference on Big Data, IEEE Big Data 2014 - Washington, United States
Duration: 27 Oct 201430 Oct 2014

Publication series

NameProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014

Conference

Conference2nd IEEE International Conference on Big Data, IEEE Big Data 2014
Country/TerritoryUnited States
CityWashington
Period27/10/1430/10/14

Keywords

  • Big Data
  • Data Summarization
  • Hierarchical Heavy Hitters

Fingerprint

Dive into the research topics of 'A summarization paradigm for big data'. Together they form a unique fingerprint.

Cite this