Redoop infrastructure for recurring big data queries

Chuan Lei, Zhongfang Zhuang, Elke A. Rundensteiner, Mohamed Y. Eltabakh

Research output: Contribution to journalConference articlepeer-review

7 Citations (Scopus)

Abstract

This demonstration presents the Redoop infrastructure, the first fullfledged MapReduce framework with native support for recurring big data queries. Recurring queries, repeatedly being executed for long periods of time over evolving high-volume data, have become a bedrock component in most large-scale data analytic applications. Redoop is a comprehensive extension to Hadoop that pushes the support and optimization of recurring queries into Hadoop's core functionality. While backward compatible with regular MapReduce jobs, Redoop achieves an order of magnitude better performance than Hadoop for recurring workloads. Redoop employs innovative window-aware optimization techniques for such recurring workloads including adaptive window-aware data partitioning, cache-aware task scheduling, and inter-window caching mechanisms. We will demonstrate Redoop's capabilities on a compute cluster against real life workloads including click-stream and sensor data analysis.

Original languageEnglish
Pages (from-to)1589-1592
Number of pages4
JournalProceedings of the VLDB Endowment
Volume7
Issue number13
DOIs
Publication statusPublished - 2014
Externally publishedYes
EventProceedings of the 40th International Conference on Very Large Data Bases, VLDB 2014 - Hangzhou, China
Duration: 1 Sept 20145 Sept 2014

Fingerprint

Dive into the research topics of 'Redoop infrastructure for recurring big data queries'. Together they form a unique fingerprint.

Cite this