TY - GEN
T1 - Accelerating batch analytics with residual resources from interactive clouds
AU - Clay, R. Benjamin
AU - Shen, Zhiming
AU - Ma, Xiaosong
PY - 2013
Y1 - 2013
N2 - The popularity of cloud-based interactive computing services (e.g., virtual desktops) brings new management challenges. Each interactive user leaves abundant but fluctuating residual resources while being intolerant to latency, precluding the use of aggressive VM consolidation. In this paper, we present the Resource Harvester for Interactive Clouds (RHIC), an autonomous management framework that harnesses dynamic residual resources aggressively without slowing the harvested interactive services. RHIC builds ad-hoc clusters for running throughput-oriented 'background' workloads using a hybrid of residual and dedicated resources. These hybrid clusters offer significant gains over normal dedicated clusters: 20-40% cost and 20-29% energy savings in our test bed. For a given background job, RHIC intelligently discovers and maintains the ideal cluster size and composition, to meet user-specified goals such as cost/energy minimization or deadlines. RHIC employs black-box workload performance modeling, requiring only system-level metrics and incorporating techniques to improve modeling accuracy with bursty and heterogeneous residual resources. We demonstrate the effectiveness and adaptivity of our RHIC prototype with two parallel data analytics frameworks, Hadoop and HBase. Our results show that RHIC finds near-ideal cluster sizes and compositions across a wide range of workload/goal combinations.
AB - The popularity of cloud-based interactive computing services (e.g., virtual desktops) brings new management challenges. Each interactive user leaves abundant but fluctuating residual resources while being intolerant to latency, precluding the use of aggressive VM consolidation. In this paper, we present the Resource Harvester for Interactive Clouds (RHIC), an autonomous management framework that harnesses dynamic residual resources aggressively without slowing the harvested interactive services. RHIC builds ad-hoc clusters for running throughput-oriented 'background' workloads using a hybrid of residual and dedicated resources. These hybrid clusters offer significant gains over normal dedicated clusters: 20-40% cost and 20-29% energy savings in our test bed. For a given background job, RHIC intelligently discovers and maintains the ideal cluster size and composition, to meet user-specified goals such as cost/energy minimization or deadlines. RHIC employs black-box workload performance modeling, requiring only system-level metrics and incorporating techniques to improve modeling accuracy with bursty and heterogeneous residual resources. We demonstrate the effectiveness and adaptivity of our RHIC prototype with two parallel data analytics frameworks, Hadoop and HBase. Our results show that RHIC finds near-ideal cluster sizes and compositions across a wide range of workload/goal combinations.
KW - Adaptive systems
KW - Distributed computing
KW - Performance analysis
UR - http://www.scopus.com/inward/record.url?scp=84894578522&partnerID=8YFLogxK
U2 - 10.1109/MASCOTS.2013.63
DO - 10.1109/MASCOTS.2013.63
M3 - Conference contribution
AN - SCOPUS:84894578522
SN - 9780769551029
T3 - Proceedings - IEEE Computer Society's Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, MASCOTS
SP - 414
EP - 423
BT - Proceedings - 2013 IEEE 21st International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication, MASCOTS 2013
T2 - 2013 IEEE 21st International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication, MASCOTS 2013
Y2 - 14 August 2013 through 16 August 2013
ER -