Parallelism is built into the current version of Febrl
transparent to the user. Running Febrl in parallel allows to
solve problems with a shorter run-time compared to run them
sequentially, or alternatively allows to solve larger problems due to
the (usual) availability of larger amounts of memory on parallel
In order to be able to use the parallel functionality of
Febrl the following software must be installed on your
computing platform (assuming a parallel hardware like a multiprocessor
or a cluster of personal computers or workstations is available).
- MPI (Message Passing Interface)
MPI is a quasi standard for parallel programming on distributed
memory platforms. It defines a large number of routines for
communicating data (i.e. messages) between processors. While
MPI itself only defines the standard (so that programs written
in MPI are portable to various parallel platforms), there are
different implementations available, some from vendors of
parallel (super-) computers, others as free downloadable
packages. Please see the MPI web page at
for more information on MPI and links to various
implementations. Note that on many platforms administrator (or
superuser) access rights are needed in order to be able to
install an MPI implementation.
Pypar is an efficient and easy-to-use module that
allows programs/scripts written in the Python programming
language to run in parallel on multiple processors and
communicate using MPI. See the Pypar web page at
for more information and to download the package.
Once both MPI and Pypar are installed and tested successfully, you can
run Febrl in parallel by using the mpirun command
of your MPI implementation. For example, if you have a Febrl
project module called myproject.py and you have a parallel
platform with 8 processors, you can run Febrl in parallel by
mpirun -np 8 python myproject.py
In order to be able to access the data sets and look-up tables, all
processors must be able to have access to the directory (and
sub-directories) defined in the Febrl object attribute
febrl_path. Future version of Febrl will allow a
much more sophisticated definition of parallelisation settings.
Note that parallelism within Febrl is in its initial
stages, and we would like to ask people who are interested in this
area to contact the authors for further exchange of detailed
information and experiences.
While doing extensive tests running parallel Febrl
some cases we got slightly different numerical results when linking
or deduplication larger data sets (with around
records). When comparing the results from running the same job on
different numbers of processes, the final weights (as stored in a
results file) sometimes differ in the range of
or fifth digit after the comma).
So far we have not found the cause of these problems, which might be
part of our local MPI/Pypar installation, or within one of the
We are currently working on this problem and will publish an updated
and hopefully correct version of Febrl as soon as the
problem is solved.