When using BlobFlow
on many processors whether it is one SMP machine or
several networked processors, there are three things to remember. Every
multicomputer is different. Every multicomputer is different. Every
multicomputer is different.
While it is more and more common to find large numbers of unused computers to
run your calculations, using BlobFlow
with MPI may require some tuning.
The most important of these is communication overhead. Scaling across
multiple processors requires that the individual processors communicate from
time to time. Naturally, it is best to keep this to a minimum because
message passing is the slowest operation in this algorithm.
Both the peer and the master-slave scheme do their best to keep all
processors busy, but network latency can spoil things.
For the master-slave algorithm, WORKSIZE is a key parameter that can be
tuned. This integer controls the size, in blobs or computational elements,
of the work to be performed by individual processes. Larger worksizes mean
larger packets will be exchanged less frequently. Smaller worksizes mean
smaller packets will be exchanged more often. Tuning WORKSIZE for your
network can dramatically improve (or diminish) the performance of BlobFlow
.
For the perr algorithm, a similar parameter called SMALLWORK governs whether or not a process will share work with its companion if the companion has finished all of its tasks. If the processor has SMALLWORK or less to do, it does not share the work because it would not be worth the transmission time. However, if the processor has more than SMALLWORK, it will share half of it with the companion.