Repackage the code so it can be installed using the standard
Python distutils module.
Add the functionality to compile proper output data sets,
similar to how input and temporary data sets are defined.
For the display record pairs output form, add the possibility to
display the field comparison weights on the right side for the
fields that were used in the comparison functions. Currently
only the final weight is displayed for each record pair.
Continue to add unit tests to all modules and programs in the
Febrl systems, using the Python doctest
facility in the docutils module, as well as the
standard Python unittest module. Of course, some
unit test will require example data sets to process so that the
results can be compared against the expected results.
Ensure that the system can process Unicode strings correctly.
Note that Python (since version 1.6) has facilities for working
with Unicode data, but the current Febrl program code
has not been written with Unicode strings in mind and may need
to be modified. Further to this, it may be possible to implement
transliteration tables so that it is possible to link data sets
encoded in different alphabets, e.g. in Khmer and Roman
alphabets.
Add the ability to read and write data from and to ODBC data
sources. Also the ability to read and write data as XML
documents.
Add the ability to read multi-line and maybe even
hierarchical-format files (since hierarchical databases are
still used by a lot of mainframe data processing systems written
in COBOL etc.). The ability to writing these formats is another
matter, however.
Improve parallelism for all modules, make the Febrl
system scalable. Explore other parallel environments than
PyPar, e.g. PyRO.
Add methods so that projects know how to print their own
configurations in a neat format, i.e. they write their own
documentation.
.Save() and .Load() methods should be
added to the project class, and these serialise and
de-serialise all the attributes etc of the project. This
would allow projects to be set up, run and saved to a file,
and reloaded interactively from the Python prompt as well as
from script files as above.
Currently Febrl is oriented towards batch
processing using modules invoked from the command line (or from
a batch file or shell script). This is probably the most useful
interface for biomedical researchers. However, later versions
may offer other APIs, such as an object-oriented Python API and
Web service interfaces (via XML-RPC, SOAP, HL-7 or CorbaMed), in
order to facilitate the embedding of Febrl in other
systems such as cancer registry databases or even Patient Master
Indexes (PMIs). The C language version of the Python language
can itself be quite easily (and freely) embedded in other
software. Although we haven't tried it, Febrl should
work OK under Jython, the Java implementation of the Python
language, making it easy to embed in Java-based systems.
Febrl - Freely extensible biomedical record linkage