next up previous
Next: Using BlobFlow Up: Building BlobFlow Previous: For the impatient

Build options

BlobFlow$^{\rm TM}$ is written entirely in ANSI C. While it was developed on Unix platforms, there are no machine specific functions in the code. The code is distributed with a vanilla Makefile for the Unix make utility.

BlobFlow$^{\rm TM}$ uses conditional compilation so that users can customize the executable for special situations. For instance, a user may wish to impose symmetries on the problem and reduce the number of computational elements accordingly. Several conditional features are available.

1.
Antisymmetry about the x-axis (-DXANTISYMM): Since I am interested in vortex dipole collisions where there is an implicit symmetry about the x-axis, I am built this into the code using this compiler switch. When using this feature, the code only requires half of the usual amount of information, and there is a savings of a factor of two in CPU time for a calculation.

2.
Use the Message Passing Interface (MPI) to parallelize across multiple processors (-DMULTIPROC): Many aspects of the algorithm are parallelizable, and this feature spreads the work among many processors. To use this capability, you must install LAM/MPI (available at http://www.mpi.nd.edu/lam) properly on the machines that you wish to use. You should also familiarize yourself running programs with MPI calls. It's easy to learn and worth the time. BlobFlow$^{\rm TM}$ uses two algorithms under MPI. If only two processors are available, it uses a peer based scheme to evenly spread the work between the two processors. If there are more than two CPU's available, BlobFlow$^{\rm TM}$ uses a master-slave algorithm with a receive and dispatch scheme to balance the work amongst all available processors. This means that one process is a dedicated that merely oversees activities without doing any number crunching. In principle, a two processor peer-based system will run about as fast, and perhaps a little faster, than a three processor master-slave system. However, if one has N+1 processors, the total computation time should scale like 1/N for small groups of processors.

3.
Cache resorting (-DCACHERESORT): Though CPU speeds have increased substantially in recent years, front side bus and memory speeds have not kept pace. In fact, most of the memory is orders of magnitude slower than the CPU. A small reserve, called cache, is fast memory. When instructions and data for calculations are resident in this cache, codes will run substantially faster. Computer architects and compiler authors are pretty crafty and optimizing the use of cache, but there are also ways to write code to take advantage of cache. I have attempted to do this in BlobFlow$^{\rm TM}$ but have found that cache awareness has little or no impact on the platforms I have used. However, since cache structure is machine dependent, I have left the feature in place.

4.
Merging diagnostics (-DMERGEDIAG): Sometimes users wish to collect information about when and how many blobs are merged. With this flag set, BlobFlow$^{\rm TM}$ will dump merge information to the diagnostic log. Every merging event will be listed as an integer indicating how many elements were clustered into a single merging event.

5.
Direct summation (-DNOFASTMP): To accelerate computations, the program uses fast multipole summation for the velocity and velocity derivative calculations. While this is accepted practice, it can induce small but quantifiable errors. This can be disabled with the NOFASTMP flag.

To use these options either together or separately, under the GNU make utility, you just set the switches on the command line. For instance,

make mpi=on xantisymm=on
will build an executable with both options.

If you do not have the GNU make utility on your system, you must edit the vanilla Makefile. It is easy to do. Just follow the instructions and comment/uncomment the appropriate sections of flags.


next up previous
Next: Using BlobFlow Up: Building BlobFlow Previous: For the impatient
Louis F Rossi
2001-08-01