This manual describes prototype software called Febrl designed to undertake probabilistic data cleaning and standardisation, deduplication and record linkage. Written in the Python programming language, this software aims to allow health, biomedical and other researchers to clean (standardise) and deduplicate or link data sets of all sizes faster, with less effort and with improved quality.

This fifth release Febrl Version 0.3.1 contains a new main feature which is geocoding, as well as several smaller updated or improved features. The main features of the current release are:

The authors would be grateful if users of Febrl would inform us (by e-mail) of how they have used the system. We are particularly interested in references to scientific papers or reports which mention or cite Febrl (please see next page).

Citing Febrl
If you want to refer to Febrl in a publication, please cite our PAKDD-2004 paper Febrl - A Parallel Open Source Data Linkage System. The full citation is:

Febrl - A Parallel Open Source Data Linkage System
Peter Christen, Tim Churches and Markus Hegland
Proceedings of the 8th Pacific-Asia Conference, PAKDD 2004, Sydney,
Australia, May 26-28, 2004. Pages 638 - 647.
Springer Lecture Notes in Artificial Intelligence, Volume 3056.

