17. Parallelism

Parallelism is built into the current version of Febrl transparent to the user. Running Febrl in parallel allows to solve problems with a shorter run-time compared to run them sequentially, or alternatively allows to solve larger problems due to the (usual) availability of larger amounts of memory on parallel computing platforms.

In order to be able to use the parallel functionality of Febrl the following software must be installed on your computing platform (assuming a parallel hardware like a multiprocessor or a cluster of personal computers or workstations is available).

Once both MPI and Pypar are installed and tested successfully, you can run Febrl in parallel by using the mpirun command of your MPI implementation. For example, if you have a Febrl project module called myproject.py and you have a parallel platform with 8 processors, you can run Febrl in parallel by using

  mpirun -np 8 python myproject.py

Note: In order to be able to access the data sets and look-up tables, all processors must be able to have access to the directory (and sub-directories) defined in the Febrl object attribute febrl_path. Future version of Febrl will allow a much more sophisticated definition of parallelisation settings.

Note: Note that parallelism within Febrl is in its initial stages, and we would like to ask people who are interested in this area to contact the authors for further exchange of detailed information and experiences.

Warning: While doing extensive tests running parallel Febrl jobs in some cases we got slightly different numerical results when linking or deduplication larger data sets (with around $100,000$ or more records). When comparing the results from running the same job on different numbers of processes, the final weights (as stored in a results file) sometimes differ in the range of $10^{-4}$ (fourth or fifth digit after the comma).

So far we have not found the cause of these problems, which might be part of our local MPI/Pypar installation, or within one of the Febrl modules.

We are currently working on this problem and will publish an updated and hopefully correct version of Febrl as soon as the problem is solved.