EM∗: An em algorithm for big data

Hasan Kurban, Mark Jenne, Mehmet M. Dalkilic

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Citations (Scopus)

Abstract

Existing data mining techniques, more particularly iterative learning algorithms, become overwhelmed with big data. While parallelism is an obvious and, usually, necessary strategy, we observe that both (1) continually revisiting data and (2) visiting all data are two of the most prominent problems especially for iterative, unsupervised algorithms like Expectation Maximization algorithm for clustering (EM-T). Our strategy is to embed EM-T into a non-linear hierarchical data structure(heap) that allows us to (1) separate data that needs to be revisited from data that does not and (2) narrow the iteration toward the data that is more difficult to cluster. We call this extended EM-T, EM∗. We show our EM∗ algorithm outperform EM-T algorithm over large real world and synthetic data sets. We lastly conclude with some theoretic underpinnings that explain why EM∗ is successful.

Original languageEnglish
Title of host publicationProceedings - 3rd IEEE International Conference on Data Science and Advanced Analytics, DSAA 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages312-320
Number of pages9
ISBN (Electronic)9781509052066
DOIs
Publication statusPublished - 22 Dec 2016
Externally publishedYes
Event3rd IEEE International Conference on Data Science and Advanced Analytics, DSAA 2016 - Montreal, Canada
Duration: 17 Oct 201619 Oct 2016

Publication series

NameProceedings - 3rd IEEE International Conference on Data Science and Advanced Analytics, DSAA 2016

Conference

Conference3rd IEEE International Conference on Data Science and Advanced Analytics, DSAA 2016
Country/TerritoryCanada
CityMontreal
Period17/10/1619/10/16

Keywords

  • Big data
  • Clustering
  • Data mining
  • EM
  • Expectation maximization
  • Heap

Fingerprint

Dive into the research topics of 'EM∗: An em algorithm for big data'. Together they form a unique fingerprint.

Cite this