TY - GEN
T1 - Improving MPI-IO output performance with active buffering plus threads
AU - Ma, Xiaosong
AU - Winslett, Marianne
AU - Lee, Jonghyun
AU - Yu, Shengke
N1 - Publisher Copyright:
© 2003 IEEE.
PY - 2003
Y1 - 2003
N2 - Efficient collective output of intermediate results to secondary storage becomes more and more important for scientific simulations as the gap between processing power/interconnection bandwidth and the I/O system bandwidth enlarges. Dedicated servers can offload I/O from compute processors and shorten the execution time, but it is not always possible or easy for an application to use them. We propose the use of active buffering with threads (ABT) for overlapping I/O with computation efficiently and flexibly without dedicated I/O servers. We show that the implementation of ABT in ROMIO, a popular implementation of MPI-IO, greatly reduces the application-visible cost of ROMIO's collective write calls, and improves an application's overall performance by hiding I/O cost and saving implicit synchronization overhead from collective write operations. Further, ABT is high-level, platform-independent, and transparent to users, giving users the benefit of overlapping I/O with other processing tasks even when the file system or parallel I/O library does not support asynchronous I/O.
AB - Efficient collective output of intermediate results to secondary storage becomes more and more important for scientific simulations as the gap between processing power/interconnection bandwidth and the I/O system bandwidth enlarges. Dedicated servers can offload I/O from compute processors and shorten the execution time, but it is not always possible or easy for an application to use them. We propose the use of active buffering with threads (ABT) for overlapping I/O with computation efficiently and flexibly without dedicated I/O servers. We show that the implementation of ABT in ROMIO, a popular implementation of MPI-IO, greatly reduces the application-visible cost of ROMIO's collective write calls, and improves an application's overall performance by hiding I/O cost and saving implicit synchronization overhead from collective write operations. Further, ABT is high-level, platform-independent, and transparent to users, giving users the benefit of overlapping I/O with other processing tasks even when the file system or parallel I/O library does not support asynchronous I/O.
UR - http://www.scopus.com/inward/record.url?scp=84947207861&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2003.1213165
DO - 10.1109/IPDPS.2003.1213165
M3 - Conference contribution
AN - SCOPUS:84947207861
T3 - Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2003
BT - Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2003
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - International Parallel and Distributed Processing Symposium, IPDPS 2003
Y2 - 22 April 2003 through 26 April 2003
ER -