Next: Using BlobFlow
Up: Building BlobFlow
Previous: For the impatient
BlobFlow
is written entirely in ANSI C. While it was developed on Unix
platforms, there are no machine specific functions in the code. The code is
distributed with a vanilla Makefile for the Unix make utility.
BlobFlow
uses conditional compilation so that users can customize the
executable for special situations. For instance, a user may wish to impose
symmetries on the problem and reduce the number of computational elements
accordingly.
Several conditional features are available.
- 1.
- Antisymmetry about the x-axis (-DXANTISYMM): Since I am interested in
vortex dipole collisions where there is an implicit symmetry about the
x-axis, I am built this into the code using this compiler switch. When
using this feature, the code only requires half of the usual amount of
information, and there is a savings of a factor of two in CPU time for a
calculation.
- 2.
- Use the Message Passing Interface (MPI) to parallelize across multiple
processors (-DMULTIPROC): Many aspects of the algorithm are parallelizable, and
this feature spreads the work among many processors. To use this
capability, you must install LAM/MPI (available at http://www.mpi.nd.edu/lam) properly on the machines that you wish to use.
You should also familiarize yourself running programs with MPI calls. It's
easy to learn and worth the time. BlobFlow
uses two algorithms under MPI.
If only two processors are available, it uses a peer based scheme to evenly
spread the work between the two processors. If there are more than two
CPU's available, BlobFlow
uses a master-slave algorithm with a receive and
dispatch scheme to balance the work amongst all available processors. This
means that one process is a dedicated that merely oversees activities
without doing any number crunching. In principle, a two processor peer-based
system will run about as fast, and perhaps a little faster, than a three
processor master-slave system. However, if one has N+1 processors, the
total computation time should scale like 1/N for small groups of processors.
- 3.
- Cache resorting (-DCACHERESORT): Though CPU speeds have increased
substantially in recent years, front side bus and memory speeds have not
kept pace. In fact, most of the memory is orders of
magnitude slower than the CPU. A small reserve, called cache, is fast
memory. When instructions and data for calculations are resident in this
cache, codes will run substantially faster. Computer architects and
compiler authors are pretty crafty and optimizing the use of cache, but
there are also ways to write code to take advantage of cache. I have
attempted to do this in BlobFlow
but have found that cache awareness has
little or no impact on the platforms I have used. However, since cache
structure is machine dependent, I have left the feature in place.
- 4.
- Merging diagnostics (-DMERGEDIAG): Sometimes users wish to collect
information about when and how many blobs are merged. With this flag set,
BlobFlow
will dump merge information to the diagnostic log. Every merging
event will be listed as an integer indicating how many elements were
clustered into a single merging event.
- 5.
- Direct summation (-DNOFASTMP): To accelerate computations, the
program uses fast multipole summation for the velocity and velocity
derivative calculations. While this is accepted practice, it can induce
small but quantifiable errors. This can be disabled with the NOFASTMP flag.
To use these options either together or separately, under the GNU make
utility, you just set the switches on the command line. For instance,
make mpi=on xantisymm=on
will build an executable with both options.
If you do not have the GNU make utility on your system, you must edit the
vanilla Makefile. It is easy to do. Just follow the instructions and
comment/uncomment the appropriate sections of flags.
Next: Using BlobFlow
Up: Building BlobFlow
Previous: For the impatient
Louis F Rossi
2001-08-01