TY - GEN
T1 - Scalable Time Series Compound Infrastructure
AU - Alghamdi, Noura S.
AU - Zhang, Liang
AU - Rundensteiner, Elke A.
AU - Eltabakh, Mohamed Y.
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/6/10
Y1 - 2022/6/10
N2 - Objects ranging from a patient's history of medical tests to an IoT device's series of sensor maintenance records leave digital traces in the form of big time series. These time series objects do not only span exceedingly long time periods (sometimes years), but are also characterized by intermittent yet interrelated time series measurements punctuated by long gaps of silence. This prevalent data type, which we refer to as Time Series Compound objects (or, TSC), has been largely overlooked in the literature. Unique challenges arise when managing, querying and analyzing repositories of these big TSC objects. These include appropriate similarity semantics with time misalignment resiliency, efficient storage of excessively long and complex objects, and TSC-holistic indexing. We demonstrate that state-of-the-art time series systems, although effective at indexing and searching regular time series data, fail to support such big TSC data. In this work, we introduce the first comprehensive solution for managing TSC objects as first class citizen. We introduce new similarity-match semantics as well as a compact misalignment-resilient representation for TSCs. Upon this foundation, we then design a TSC-aware distributed indexing infrastructure Sloth that supports scalable storage, indexing and querying of TB-scale TSC datasets. Our experimental study demonstrates that for TB-scale datasets, the query response time of Sloth is up to one order of magnitude faster than that of existing systems, while the mean average precision (mAP) for approximate kNN similarity match query results by Sloth is 70% more accurate than existing solutions.
AB - Objects ranging from a patient's history of medical tests to an IoT device's series of sensor maintenance records leave digital traces in the form of big time series. These time series objects do not only span exceedingly long time periods (sometimes years), but are also characterized by intermittent yet interrelated time series measurements punctuated by long gaps of silence. This prevalent data type, which we refer to as Time Series Compound objects (or, TSC), has been largely overlooked in the literature. Unique challenges arise when managing, querying and analyzing repositories of these big TSC objects. These include appropriate similarity semantics with time misalignment resiliency, efficient storage of excessively long and complex objects, and TSC-holistic indexing. We demonstrate that state-of-the-art time series systems, although effective at indexing and searching regular time series data, fail to support such big TSC data. In this work, we introduce the first comprehensive solution for managing TSC objects as first class citizen. We introduce new similarity-match semantics as well as a compact misalignment-resilient representation for TSCs. Upon this foundation, we then design a TSC-aware distributed indexing infrastructure Sloth that supports scalable storage, indexing and querying of TB-scale TSC datasets. Our experimental study demonstrates that for TB-scale datasets, the query response time of Sloth is up to one order of magnitude faster than that of existing systems, while the mean average precision (mAP) for approximate kNN similarity match query results by Sloth is 70% more accurate than existing solutions.
KW - KNN approximate query
KW - distributed indexing
KW - similarity search
KW - sloth
KW - time series compound
UR - http://www.scopus.com/inward/record.url?scp=85132784867&partnerID=8YFLogxK
U2 - 10.1145/3514221.3517888
DO - 10.1145/3514221.3517888
M3 - Conference contribution
AN - SCOPUS:85132784867
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 1685
EP - 1698
BT - SIGMOD 2022 - Proceedings of the 2022 International Conference on Management of Data
PB - Association for Computing Machinery
T2 - 2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022
Y2 - 12 June 2022 through 17 June 2022
ER -