16. Installation

Before you install this software, you need to have Python version 2.3.2 (or later) installed on your system. You can download the Python source as well as binary distributions for various platforms from

http://www.python.org/download
Follow the configuration and installation instructions given on the Python Web site or which come with the Python distribution you have downloaded. Make sure you set the path variables on your system so that you can start Python by simply typing python in a command line session (e.g. at the Windows MS-DOS or secured command prompt, or in the Unix or Linux shell prompt).

To do so on Windows systems, you may have to add a 'set PATH=' line to your Autoexec.bat file. Assume you have installed Python on drive C: in the directory Python2 you may have to add

  set PATH=C:\PYTHON2;%PATH%

On Unix systems you may have to update your .cshrc or .bashrc file by adding the path to your Python installation.

See Appendix D for a list of all files distributed with the current version of Febrl.

The Febrl system is configured and controlled by a project module derived from either the supplied project-linkage.py, project-deduplicate.py or project-standardise.py modules. See Chapter 5 for more information on these modules.

For hidden Markov model training, the two programs tagdata.py (for tagging training records) and trainhmm.py (for training a hidden Markov module) need to be used. Hidden Markov model training and the two mentioned programs are described in detail in Chapter 8. The auxiliary program randomselect.py can be used to randomly select a sub-set of a data set (see Section 12.1 for more details).

For the current release, just unzip or untar the Febrl distribution file in a convenient location. Be sure to specify the create directories option in your unzip utility. On Unix or Linux systems, you would generally type

tar xvfz febrl-0.3.tar.gz

or similar.

After unzipping or untarring the Febrl distribution, change to the Febrl directory and run the various unit test programs supplied with Febrl from there. Unit test programs are file with names of the form modulenameTest.py. You can easily run these unit tests by typing for example (to test the string encoding module encode.py)

python encodeTest.py

All unit test programs should run without any errors. If you do get error message with these test programs, please send an e-mail to the Febrl authors including the error message.

Next, according to your needs make a copy of the project-linkage.py, project-deduplicate.py or project-standardise.py module and modify this copy according to your data set(s). That's it at this stage. To run a project module, for example your myproject.py, simply type

python myproject.py

In order to be able to run Febrl in parallel you need to have MPI and Pypar installed on your system. Please see Chapter 17 for more details on parallelism.

Note: Future versions of Febrl will use the standard Python distutils module to install the various Febrl components in the site-packages directory, as well as command line wrappers in an appropriate executable directory somewhere on the system path. This will allow a great deal more flexibility and will allow many people to share a single installation of Febrl on a single shared computer.