Bermuda: An efficient mapreduce triangle listing algorithm for web-scale graphs

Dongqing Xiao*, Mohamed Eltabakh, Xiangnan Kong

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Triangle listing plays an important role in graph analysis and has numerous graph mining applications. With the rapid growth of graph data, distributed methods for listing triangles over massive graphs are urgently needed. Therefore, the tri-angle listing problem has been studied in several distributed infrastructures including MapReduce. However, existing al-gorithms suffer from generating and shuffing huge amounts of intermediate data, where interestingly, a large percent- age of this data is redundant. Inspired by this observation, we present the "Bermuda" method, an efficient MapReduce-based triangle listing technique for massive graphs. Different from existing approaches, Bermuda effectively reduces the size of the intermediate data via redundancy elimination and sharing of messages whenever possible. As a result, Bermuda achieves orders-of-magnitudes of speedup and enables processing larger graphs that other techniques fail to process under the same resources. Bermuda exploits the locality of processing, i.e., in which reduce instance each graph vertex will be processed, to avoid the redundancy of generating messages from mappers to reducers. Bermuda also proposes novel message sharing techniques within each reduce instance to increase the usability of the received messages. We present and analyze several reduce-side caching strategies that dynamically learn the expected access patterns of the shared messages, and adaptively deploy the appropriate technique for better sharing. Extensive experiments conducted on real-world large-scale graphs show that Bermuda speeds up the triangle listing computations by factors up to 10x. Moreover, with a relatively small cluster, Bermuda can scale up to large datasets, e.g., ClueWeb graph dataset (688GB), while other techniques fail to finish.

Original languageEnglish
Title of host publicationScientific and Statistical Database Management
Subtitle of host publication28th International Conference, SSDBM 2016 - Proceedings
EditorsIoana Manolescu-Goujot, Luca Trani, Peter Baumann, Laszlo Dobos, Yannis Ioannidis, Gergely Gabor Barnafoldi, Evelin Banyai
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450342155
DOIs
Publication statusPublished - 18 Jul 2016
Externally publishedYes
Event28th International Conference on Scientific and Statistical Database Management, SSDBM 2016 - Budapest, Hungary
Duration: 18 Jul 201620 Jul 2016

Publication series

NameACM International Conference Proceeding Series
Volume18-20-July-2016

Conference

Conference28th International Conference on Scientific and Statistical Database Management, SSDBM 2016
Country/TerritoryHungary
CityBudapest
Period18/07/1620/07/16

Keywords

  • Distributed Triangle Listing
  • Graph Analytics
  • MapReduce

Fingerprint

Dive into the research topics of 'Bermuda: An efficient mapreduce triangle listing algorithm for web-scale graphs'. Together they form a unique fingerprint.

Cite this