Efficient intranode communication in GPU-accelerated systems

Feng Ji*, Ashwin M. Aji, James Dinan, Darius Buntinas, Pavan Balaji, Wu Chun Feng, Xiaosong Ma

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Citations (Scopus)

Abstract

Current implementations of MPI are unaware of accelerator memory (i.e., GPU device memory) and require programmers to explicitly move data between memory spaces. This approach is inefficient, especially for intranode communication where it can result in several extra copy operations. In this work, we integrate GPU-awareness into a popular MPI runtime system and develop techniques to significantly reduce the cost of intranode communication involving one or more GPUs. Experiment results show an up to 2x increase in bandwidth, resulting in an average of 4.3% improvement to the total execution time of a halo exchange benchmark.

Original languageEnglish
Title of host publicationProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012
Pages1838-1847
Number of pages10
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012 - Shanghai, China
Duration: 21 May 201225 May 2012

Publication series

NameProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012

Conference

Conference2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012
Country/TerritoryChina
CityShanghai
Period21/05/1225/05/12

Keywords

  • CUDA
  • GPU
  • Intranode communication
  • MPI
  • MPICH2
  • Nemesis

Fingerprint

Dive into the research topics of 'Efficient intranode communication in GPU-accelerated systems'. Together they form a unique fingerprint.

Cite this