Multi-Tactic distance-based outlier detection

Lei Cao, Yizhou Yan, Caitlin Kuhlman, Qingyang Wang, Elke A. Rundensteiner, Mohamed Eltabakh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

17 Citations (Scopus)

Abstract

As datasets increase radically in size, highly scalable algorithms leveraging modern distributed infrastructures need to be developed for detecting outliers in massive datasets. In this work, we present the first distributed distance-based outlier detection approach using the MapReduce-based infrastructure, called DOD. DOD features a single-pass execution framework that minimizes communication overhead. Furthermore, DOD overturns two fundamental assumptions widely adopted in the distributed analytics literature, namely cardinality-based load balancing and one algorithm for all data. The multi-Tactic strategy of DOD achieves a truly balanced workload by taking into account the data characteristics in data partitioning and assigns most appropriate algorithm for each partition based on our theoretical cost models established for distinct classes of detection algorithms. Thus, DOD effectively minimizes the end-To-end execution time. Our experimental study confirms the efficiency of DOD and its scalability to terabytes of data, beating the baseline solutions by a factor of 20x.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE 33rd International Conference on Data Engineering, ICDE 2017
PublisherIEEE Computer Society
Pages959-970
Number of pages12
ISBN (Electronic)9781509065431
DOIs
Publication statusPublished - 16 May 2017
Externally publishedYes
Event33rd IEEE International Conference on Data Engineering, ICDE 2017 - San Diego, United States
Duration: 19 Apr 201722 Apr 2017

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Conference

Conference33rd IEEE International Conference on Data Engineering, ICDE 2017
Country/TerritoryUnited States
CitySan Diego
Period19/04/1722/04/17

Fingerprint

Dive into the research topics of 'Multi-Tactic distance-based outlier detection'. Together they form a unique fingerprint.

Cite this