Big Data Series Analytics Using TARDIS and its Exploitation in Geospatial Applications

Liang Zhang, Noura Alghamdi, Mohamed Y. Eltabakh, Elke A. Rundensteiner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

The massive amounts of data series data continuously generated and collected by applications require new indices to speed up data series similarity queries on which various data mining techniques rely. However, the state-of-the-art iSAX-based indexing techniques do not scale well due to the binary fanout that leads to a highly deep index tree and suffer from accuracy degradation due to the character-level cardinality that leads to poor maintenance of the proximity. To address this problem, we recently proposed TARDIS to supports indexing and querying billion-scale data series datasets. It introduces a new iSAX-T signatures to reduce the cardinality conversion cost and corresponding sigTree to construct a compact index structure to preserve better similarity. The framework consists of one centralized index and local distributed indices to efficiently re-partition and index dimensional datasets. Besides, effective query strategies based on sigTree structure are proposed to greatly improve the accuracy. In this demonstration, we present GENET, a new interactive exploration demonstration that allows users to support Big Data Series Approximate Retrieval and Recursive Interactive Clustering in large-scale geospatial datasets using TARDIS index techniques.

Original languageEnglish
Title of host publicationSIGMOD 2020 - Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages2785-2788
Number of pages4
ISBN (Electronic)9781450367356
DOIs
Publication statusPublished - 14 Jun 2020
Externally publishedYes
Event2020 ACM SIGMOD International Conference on Management of Data, SIGMOD 2020 - Portland, United States
Duration: 14 Jun 202019 Jun 2020

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2020 ACM SIGMOD International Conference on Management of Data, SIGMOD 2020
Country/TerritoryUnited States
CityPortland
Period14/06/2019/06/20

Keywords

  • GENET
  • KNN approximate query
  • TARDIS
  • approximate query processing
  • clustering
  • data series
  • distributed indexing
  • geospatial
  • iSAX-T
  • sigtree
  • word-level cardinality

Fingerprint

Dive into the research topics of 'Big Data Series Analytics Using TARDIS and its Exploitation in Geospatial Applications'. Together they form a unique fingerprint.

Cite this