TY - GEN
T1 - Bermuda
T2 - 28th International Conference on Scientific and Statistical Database Management, SSDBM 2016
AU - Xiao, Dongqing
AU - Eltabakh, Mohamed
AU - Kong, Xiangnan
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/7/18
Y1 - 2016/7/18
N2 - Triangle listing plays an important role in graph analysis and has numerous graph mining applications. With the rapid growth of graph data, distributed methods for listing triangles over massive graphs are urgently needed. Therefore, the tri-angle listing problem has been studied in several distributed infrastructures including MapReduce. However, existing al-gorithms suffer from generating and shuffing huge amounts of intermediate data, where interestingly, a large percent- age of this data is redundant. Inspired by this observation, we present the "Bermuda" method, an efficient MapReduce-based triangle listing technique for massive graphs. Different from existing approaches, Bermuda effectively reduces the size of the intermediate data via redundancy elimination and sharing of messages whenever possible. As a result, Bermuda achieves orders-of-magnitudes of speedup and enables processing larger graphs that other techniques fail to process under the same resources. Bermuda exploits the locality of processing, i.e., in which reduce instance each graph vertex will be processed, to avoid the redundancy of generating messages from mappers to reducers. Bermuda also proposes novel message sharing techniques within each reduce instance to increase the usability of the received messages. We present and analyze several reduce-side caching strategies that dynamically learn the expected access patterns of the shared messages, and adaptively deploy the appropriate technique for better sharing. Extensive experiments conducted on real-world large-scale graphs show that Bermuda speeds up the triangle listing computations by factors up to 10x. Moreover, with a relatively small cluster, Bermuda can scale up to large datasets, e.g., ClueWeb graph dataset (688GB), while other techniques fail to finish.
AB - Triangle listing plays an important role in graph analysis and has numerous graph mining applications. With the rapid growth of graph data, distributed methods for listing triangles over massive graphs are urgently needed. Therefore, the tri-angle listing problem has been studied in several distributed infrastructures including MapReduce. However, existing al-gorithms suffer from generating and shuffing huge amounts of intermediate data, where interestingly, a large percent- age of this data is redundant. Inspired by this observation, we present the "Bermuda" method, an efficient MapReduce-based triangle listing technique for massive graphs. Different from existing approaches, Bermuda effectively reduces the size of the intermediate data via redundancy elimination and sharing of messages whenever possible. As a result, Bermuda achieves orders-of-magnitudes of speedup and enables processing larger graphs that other techniques fail to process under the same resources. Bermuda exploits the locality of processing, i.e., in which reduce instance each graph vertex will be processed, to avoid the redundancy of generating messages from mappers to reducers. Bermuda also proposes novel message sharing techniques within each reduce instance to increase the usability of the received messages. We present and analyze several reduce-side caching strategies that dynamically learn the expected access patterns of the shared messages, and adaptively deploy the appropriate technique for better sharing. Extensive experiments conducted on real-world large-scale graphs show that Bermuda speeds up the triangle listing computations by factors up to 10x. Moreover, with a relatively small cluster, Bermuda can scale up to large datasets, e.g., ClueWeb graph dataset (688GB), while other techniques fail to finish.
KW - Distributed Triangle Listing
KW - Graph Analytics
KW - MapReduce
UR - http://www.scopus.com/inward/record.url?scp=84982179858&partnerID=8YFLogxK
U2 - 10.1145/2949689.2949715
DO - 10.1145/2949689.2949715
M3 - Conference contribution
AN - SCOPUS:84982179858
T3 - ACM International Conference Proceeding Series
BT - Scientific and Statistical Database Management
A2 - Manolescu-Goujot, Ioana
A2 - Trani, Luca
A2 - Baumann, Peter
A2 - Dobos, Laszlo
A2 - Ioannidis, Yannis
A2 - Barnafoldi, Gergely Gabor
A2 - Banyai, Evelin
PB - Association for Computing Machinery
Y2 - 18 July 2016 through 20 July 2016
ER -