Front Matter

Copyright © 2002-2005 Australian National University. All rights reserved.

See Appendix H of this document for the license conditions under which this document and the computer programs described in it may be used.


This manual describes prototype software called Febrl designed to undertake probabilistic data cleaning and standardisation, deduplication and record linkage. Written in the Python programming language, this software aims to allow health, biomedical and other researchers to clean (standardise) and deduplicate or link data sets of all sizes faster, with less effort and with improved quality.

This fifth release Febrl Version 0.3.1 contains a new main feature which is geocoding, as well as several smaller updated or improved features. The main features of the current release are:

The authors would be grateful if users of Febrl would inform us (by e-mail) of how they have used the system. We are particularly interested in references to scientific papers or reports which mention or cite Febrl (please see next page).

Citing Febrl
If you want to refer to Febrl in a publication, please cite our PAKDD-2004 paper Febrl - A Parallel Open Source Data Linkage System. The full citation is:

Febrl - A Parallel Open Source Data Linkage System
Peter Christen, Tim Churches and Markus Hegland
Proceedings of the 8th Pacific-Asia Conference, PAKDD 2004, Sydney,
Australia, May 26-28, 2004. Pages 638 - 647.
Springer Lecture Notes in Artificial Intelligence, Volume 3056.

See Also:


This document is subject to the ANUOS License Version 1.2 (the License, see Appendix H of this document); you may not use this document except in compliance with the License. All Febrl computer program code and associated data files and documentation, including this document, are distributed under the License on an AS IS basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License.