Shared execution of recurring workloads in MapReduce

Chuan Lei, Zhongfang Zhuang, Elke A. Rundensteiner, Mohamed Eltabakh

Research output: Contribution to journalConference articlepeer-review

9 Citations (Scopus)

Abstract

With the increasing complexity of data-intensive MapReduce workloads, Hadoop must often accommodate hundreds or even thousands of recurring analytics queries that periodically execute over frequently updated datasets, e.g., latest stock transactions, new log files, or recent news feeds. For many applications, such recurring queries come with user-specified service-level agreements (SLAs), commonly expressed as the maximum allowed latency for producing results before their merits decay. The recurring nature of these emerging workloads combined with their SLA constraints make it challenging to share and optimize their execution. While some recent efforts on multi-job optimization in MapReduce have emerged, they focus on only sharing work among ad-hoc jobs on static datasets. Unfortunately, these sharing techniques neither take the recurring nature of the queries into account nor guarantee the satisfaction of the SLA requirements. In this work, we propose the first scalable multi-query sharing engine tailored for recurring workloads in the MapReduce infrastructure, called "Helix". Helix deploys new sliced window-alignment techniques to create sharing opportunities among recurring queries without introducing additional I/O overheads or unnecessary data scans. And then, Helix introduces a cost/benefit model for creating a sharing plan among the recurring queries, and a scheduling strategy for executing them to maximize the SLA satisfaction. Our experimental results over real-world datasets confirm that Helix significantly outperforms the state-of-art techniques by an order of magnitude.

Original languageEnglish
Pages (from-to)714-725
Number of pages12
JournalProceedings of the VLDB Endowment
Volume8
Issue number7 7
DOIs
Publication statusPublished - 2015
Externally publishedYes
Event41st International Conference on Very Large Data Bases, VLDB 2015 - Kohala Coast, United States
Duration: 31 Aug 20154 Sept 2015

Fingerprint

Dive into the research topics of 'Shared execution of recurring workloads in MapReduce'. Together they form a unique fingerprint.

Cite this