TY - GEN
T1 - DRS+
T2 - 22nd IEEE International Conference on High Performance Computing and Communications, 18th IEEE International Conference on Smart City and 6th IEEE International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020
AU - Tang, Kailin
AU - Hao, Zhifeng
AU - Cai, Ruichu
AU - Fu, Tom Z.J.
AU - Yang, Yin
AU - Wang, Li
AU - Winslett, Marianne
AU - Zhang, Zhenjie
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12
Y1 - 2020/12
N2 - Distributed stream processing is nowadays a leading paradigm for managing massive streaming data and performing real-Time analytics on such streams. Since data volume and distribution in the input streams often change over time, a dynamic resource scheduler is often employed to ensure that the system meets response time constraints while being cost effective in terms of resource usage. Currently, resource schedulers in distributed streaming systems generally assume that each input must be completely processed, and allocate resources accordingly. In practice, this assumption can often be relaxed, since many stream analytics tasks do not require exact outputs. To our knowledge, however, no existing resource scheduler takes this fact into consideration, leading to unnecessary over-provisioning of resources.This paper presents DRS+, a novel dynamic resource scheduler that integrates load shedding into resource auto-scaling strategies. DRS+ is based on a unified model that establishes the relationship between response time, result accuracy and resource consumption, given the current workload statistics. Using this model, DRS+ computes the best resource allocation plan and load shedding strategy, and executes them through an efficient protocol that minimizes the computation and communication overhead at each operator. We have implemented DRS+ based on Apache Storm, and evaluated it using real dataset. The results demonstrate that DRS+ achieves low resource consumption and high result utility, while satisfying real-Time response constraints.
AB - Distributed stream processing is nowadays a leading paradigm for managing massive streaming data and performing real-Time analytics on such streams. Since data volume and distribution in the input streams often change over time, a dynamic resource scheduler is often employed to ensure that the system meets response time constraints while being cost effective in terms of resource usage. Currently, resource schedulers in distributed streaming systems generally assume that each input must be completely processed, and allocate resources accordingly. In practice, this assumption can often be relaxed, since many stream analytics tasks do not require exact outputs. To our knowledge, however, no existing resource scheduler takes this fact into consideration, leading to unnecessary over-provisioning of resources.This paper presents DRS+, a novel dynamic resource scheduler that integrates load shedding into resource auto-scaling strategies. DRS+ is based on a unified model that establishes the relationship between response time, result accuracy and resource consumption, given the current workload statistics. Using this model, DRS+ computes the best resource allocation plan and load shedding strategy, and executes them through an efficient protocol that minimizes the computation and communication overhead at each operator. We have implemented DRS+ based on Apache Storm, and evaluated it using real dataset. The results demonstrate that DRS+ achieves low resource consumption and high result utility, while satisfying real-Time response constraints.
UR - http://www.scopus.com/inward/record.url?scp=85105309122&partnerID=8YFLogxK
U2 - 10.1109/HPCC-SmartCity-DSS50907.2020.00036
DO - 10.1109/HPCC-SmartCity-DSS50907.2020.00036
M3 - Conference contribution
AN - SCOPUS:85105309122
T3 - Proceedings - 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020
SP - 292
EP - 301
BT - Proceedings - 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 14 December 2020 through 16 December 2020
ER -