TY - JOUR
T1 - Efficient parallel computations on the reduced mesh of trees organization
AU - Alnuweiri, Hussein M.
AU - Prasanna, V. K.
PY - 1994/2
Y1 - 1994/2
N2 - Optimal and near optimal parallel algorithms for several fundamental problems are proposed for a parallel organization consistmg of n processors, each having access to a row and a column of an n × n array of memory modules. Parallel computations are implemented on such an organization by decomposing them into alternating orthogonal processing phases. A number of efficient data movement techniques are developed for the proposed organization which lead to optimal or near optimal solutions to several communication-intensive problems such as sorting, performing permutations, list ranking (data dependent parallel prefix), and problems on graphs represented by an unsorted list of n2 edges. It is also shown that the proposed organization is capable of simulating any fixed-degree network on n2 processors with O(n) loss in time, which is optimal. Finally, an enhanced organization having p processors, 1 ≤ p ≤ n2, and O(n2) memory locations is presented, and is shown to provide optimal speedups for adjacency-matrix based graph problems for any number of processors in the range [1, n3/2].
AB - Optimal and near optimal parallel algorithms for several fundamental problems are proposed for a parallel organization consistmg of n processors, each having access to a row and a column of an n × n array of memory modules. Parallel computations are implemented on such an organization by decomposing them into alternating orthogonal processing phases. A number of efficient data movement techniques are developed for the proposed organization which lead to optimal or near optimal solutions to several communication-intensive problems such as sorting, performing permutations, list ranking (data dependent parallel prefix), and problems on graphs represented by an unsorted list of n2 edges. It is also shown that the proposed organization is capable of simulating any fixed-degree network on n2 processors with O(n) loss in time, which is optimal. Finally, an enhanced organization having p processors, 1 ≤ p ≤ n2, and O(n2) memory locations is presented, and is shown to provide optimal speedups for adjacency-matrix based graph problems for any number of processors in the range [1, n3/2].
UR - http://www.scopus.com/inward/record.url?scp=43949161684&partnerID=8YFLogxK
U2 - 10.1006/jpdc.1994.1013
DO - 10.1006/jpdc.1994.1013
M3 - Article
AN - SCOPUS:43949161684
SN - 0743-7315
VL - 20
SP - 121
EP - 135
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
IS - 2
ER -