Redoop: Supporting Recurring Queries in Hadoop.

Chuan Lei, Elke A. Rundensteiner, Mohamed Ahmed Yassin Eltabakh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The growing demand for large-scale data analytics ranging from online advertisement placement, log processing, to fraud detection, has led to the design of highly scalable data-intensive computing infrastructures such as the Hadoop platform. Recurring queries, repeatedly being executed for long periods of time on rapidly evolving high-volume data, have become a bedrock component in most of these analytic applications. Despite their importance, the plain
Hadoop along with its state-of-art extensions lack built-in support for recurring queries. In particular, they lack efficient and scalable analytics over evolving datasets. In this work, we present the Redoop system, an extension of the Hadoop framework, designed to fill in this void. Redoop supports recurring queries as firstclass citizen in Hadoop without sacrificing any of its core features. More importantly, Redoop deploys innovative window-aware optimization techniques for recurring query execution including adaptive window-aware data partitioning, window-aware task scheduling, and inter-window caching mechanisms. Redoop retains the fault-tolerance of MapReduce via automatic cache recovery and task re-execution support. Our extensive experimental study with real datasets demonstrates that Redoop achieves significant runtime performance gains of up to 9x speedup compared to the plain Hadoop.
Original languageEnglish
Title of host publicationProceedings of the 17th International Conference on Extending Database Technology (EDBT)
Number of pages5
Publication statusPublished - 2014
Externally publishedYes

Fingerprint

Dive into the research topics of 'Redoop: Supporting Recurring Queries in Hadoop.'. Together they form a unique fingerprint.

Cite this