Abstract
This demonstration presents the Redoop infrastructure, the first fullfledged MapReduce framework with native support for recurring big data queries. Recurring queries, repeatedly being executed for long periods of time over evolving high-volume data, have become a bedrock component in most large-scale data analytic applications. Redoop is a comprehensive extension to Hadoop that pushes the support and optimization of recurring queries into Hadoop's core functionality. While backward compatible with regular MapReduce jobs, Redoop achieves an order of magnitude better performance than Hadoop for recurring workloads. Redoop employs innovative window-aware optimization techniques for such recurring workloads including adaptive window-aware data partitioning, cache-aware task scheduling, and inter-window caching mechanisms. We will demonstrate Redoop's capabilities on a compute cluster against real life workloads including click-stream and sensor data analysis.
Original language | English |
---|---|
Pages (from-to) | 1589-1592 |
Number of pages | 4 |
Journal | Proceedings of the VLDB Endowment |
Volume | 7 |
Issue number | 13 |
DOIs | |
Publication status | Published - 2014 |
Externally published | Yes |
Event | Proceedings of the 40th International Conference on Very Large Data Bases, VLDB 2014 - Hangzhou, China Duration: 1 Sept 2014 → 5 Sept 2014 |