Before you install this software, you need to have Python version 2.3.2 (or later) installed on your system. You can download the Python source as well as binary distributions for various platforms fromUnix or Linux shell prompt).
To do so on Windows systems, you may have to add a 'set
PATH=' line to your Autoexec.bat file. Assume you have
installed Python on drive
C: in the directory
Python2 you may have to add
On Unix systems you may have to update your .cshrc or .bashrc file by adding the path to your Python installation.
See Appendix D for a list of all files distributed with the current version of Febrl.
The Febrl system is configured and controlled by a project module derived from either the supplied project-linkage.py, project-deduplicate.py or project-standardise.py modules. See Chapter 5 for more information on these modules.
For hidden Markov model training, the two programs tagdata.py (for tagging training records) and trainhmm.py (for training a hidden Markov module) need to be used. Hidden Markov model training and the two mentioned programs are described in detail in Chapter 8. The auxiliary program randomselect.py can be used to randomly select a sub-set of a data set (see Section 12.1 for more details).
For the current release, just unzip or untar the Febrl distribution file in a convenient location. Be sure to specify the create directories option in your unzip utility. On Unix or Linux systems, you would generally type
tar xvfz febrl-0.3.tar.gz
After unzipping or untarring the Febrl distribution, change
Febrl directory and run the various unit test programs
supplied with Febrl from there. Unit test programs are file
with names of the form modulename
Test.py. You can easily
run these unit tests by typing for example (to test the string
encoding module encode.py)
All unit test programs should run without any errors. If you do get error message with these test programs, please send an e-mail to the Febrl authors including the error message.
Next, according to your needs make a copy of the project-linkage.py, project-deduplicate.py or project-standardise.py module and modify this copy according to your data set(s). That's it at this stage. To run a project module, for example your myproject.py, simply type
In order to be able to run Febrl in parallel you need to have MPI and Pypar installed on your system. Please see Chapter 17 for more details on parallelism.