E.3 Febrl System

Repackage the code so it can be installed using the standard Python distutils module.
Add the functionality to compile proper output data sets, similar to how input and temporary data sets are defined.
For the display record pairs output form, add the possibility to display the field comparison weights on the right side for the fields that were used in the comparison functions. Currently only the final weight is displayed for each record pair.
Continue to add unit tests to all modules and programs in the Febrl systems, using the Python doctest facility in the docutils module, as well as the standard Python unittest module. Of course, some unit test will require example data sets to process so that the results can be compared against the expected results.
Ensure that the system can process Unicode strings correctly. Note that Python (since version 1.6) has facilities for working with Unicode data, but the current Febrl program code has not been written with Unicode strings in mind and may need to be modified. Further to this, it may be possible to implement transliteration tables so that it is possible to link data sets encoded in different alphabets, e.g. in Khmer and Roman alphabets.
Add the ability to read and write data from and to ODBC data sources. Also the ability to read and write data as XML documents.
Add the ability to read multi-line and maybe even hierarchical-format files (since hierarchical databases are still used by a lot of mainframe data processing systems written in COBOL etc.). The ability to writing these formats is another matter, however.
Improve parallelism for all modules, make the Febrl system scalable. Explore other parallel environments than PyPar, e.g. PyRO.
Use Pyrex (or something similar) to speed up execution, see http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/. An alternative would be to use Psyco, see http://psyco.sourceforge.net/.
Notes on object orientation of Febrl
1. Add methods so that projects know how to print their own configurations in a neat format, i.e. they write their own documentation.
2. .Save() and .Load() methods should be added to the project class, and these serialise and de-serialise all the attributes etc of the project. This would allow projects to be set up, run and saved to a file, and reloaded interactively from the Python prompt as well as from script files as above.
Currently Febrl is oriented towards batch processing using modules invoked from the command line (or from a batch file or shell script). This is probably the most useful interface for biomedical researchers. However, later versions may offer other APIs, such as an object-oriented Python API and Web service interfaces (via XML-RPC, SOAP, HL-7 or CorbaMed), in order to facilitate the embedding of Febrl in other systems such as cancer registry databases or even Patient Master Indexes (PMIs). The C language version of the Python language can itself be quite easily (and freely) embedded in other software. Although we haven't tried it, Febrl should work OK under Jython, the Java implementation of the Python language, making it easy to embed in Java-based systems.