Febrl
- Freely extensible biomedical record linkage
Previous:
Front Matter
Up:
Febrl - Freely extensible
Next:
1. Acknowledgments
Contents
1. Acknowledgments
2. Introduction
2.1 Performance
3. Data Cleaning and Record Linkage
4. System Overview
5. Configuration and Running Febrl using a Module derived from 'project.py'
6. Data Cleaning and Standardisation
6.1 Name and Address Cleaning and Standardisation
6.1.1 Step 1: Cleaning
6.1.2 Step 2: Tagging
6.1.3 Step 3: Segmentation
6.1.4 Word Spilling
6.2 Output Fields
6.3 Name Cleaning and Standardisation using a Rules Based Approach
6.4 Name Cleaning and Standardisation using a Hidden Markov Model Based Approach
6.5 Address Cleaning and Standardisation using a Hidden Markov Model Based Approach
6.6 Date Cleaning and Standardisation
6.7 Phone Number Cleaning and Standardisation
6.8 Field Passing
6.9 Record Cleaning and Standardisation
6.10 Starting a Standardisation Process
7. Hidden Markov Models for Data Standardisation
7.1 Hidden Markov Model Implementation Module 'simplehmm.py'
8. Hidden Markov Model Training
8.1 Program 'tagdata.py'
8.2 Program 'trainhmm.py'
9. Record Linkage and Deduplication
9.1 Indexing
9.1.1 Block Indexing
9.1.2 Sorting Indexing
9.1.3 Bigram Indexing
9.2 Field Comparison Functions
9.2.1 Frequency Dependent Weight Calculation
9.2.2 Exact String Comparison 'FieldComparatorExactString'
9.2.3 Truncated String Comparison 'FieldComparatorTruncateString'
9.2.4 Approximate String Comparison 'FieldComparatorApproxString'
9.2.5 Encoded String Comparison 'FieldComparatorEncodeString'
9.2.6 Keying Difference Comparison 'FieldComparatorKeyDiff'
9.2.7 Numeric Comparison with Percentage Tolerance 'FieldComparatorNumericPerc'
9.2.8 Numeric Comparison with Absolute Tolerance 'FieldComparatorNumericAbs'
9.2.9 Date Comparison with Day Tolerance 'FieldComparatorDate'
9.2.10 Age Comparison with Percentage Tolerance 'FieldComparatorAge'
9.2.11 Time Comparison with Minute Tolerance 'FieldComparatorTime'
9.2.12 Distance Comparison with Kilometer Tolerance 'FieldComparatorDistance'
9.3 Record Comparator
9.4 Example Field and Record Comparator Initialisation
9.5 Classification
9.5.1 Fellegi and Sunter Classifier
9.5.2 Flexible Classifier
9.6 Starting a Linkage or Deduplication Process
10. Geocoding
10.1 G-NAF - A Geocoded National Address File
10.2 Outline of the Febrl Geocoding Process
10.2.1 Processing the G-NAF Files
10.2.2 Additional Data Files
10.2.3 Fuzzy Matching Engine
10.3 Program 'process-gnaf.py'
10.3.1 Memory Usage and Performance of 'process-gnaf.py'
10.4 Geocoding Project Module 'project-geocode.py'
10.5 Auxiliary Geocoding Programs
10.5.1 Creating Neighbouring Look-up tables with Program 'get-neighbour-regions.py'
10.5.2 Reverse G-NAF Look-up with Program 'reverse-gnaf.py'
10.5.3 Geocoding To-Do
11. Output
11.1 Record Pair One-To-One Assignment Restrictions
12. Auxiliary Programs
12.1 Program 'randomselect.py'
12.2 Data Set Generator Program 'generate.py'
12.3 File Analysis Program 'fileanalysis.py'
13. Data Set Access
13.1 COL Data Set Implementation
13.2 CSV Data Set Implementation
13.3 SQL Data Set Implementation
13.4 Shelve Data Set Implementation
13.5 Memory Data Set Implementation
14. Look-up and Frequency Tables
14.1 Correction List
14.2 Tagging Look-up Table
14.3 Frequency Look-up Table
14.4 Geographic Location Look-up Table
14.5 Neighbouring Region Look-up Table
15. Logging and Console Output
16. Installation
17. Parallelism
A. Hidden Markov Model States
B. List of Tags
C. Rule-based Name Segmentation
C.1 Input
C.2 Process Overview
C.2.1 Step 1: Allocating the elements into one of the five sub-lists
C.2.2 Step 2: Parse each sub-list and assign into appropriate output name component
C.3 Output
D. Manifest
E. To-Do: Outstanding Development Tasks, Possible Additions and Enhancements
E.1 Data Cleaning and Standardisation
E.2 Record Linkage and Deduplication
E.3 Febrl System
F. Version History
G. Support Arrangements
H. ANU - Open Source License
Bibliography
Index
Febrl
- Freely extensible biomedical record linkage
Previous:
Front Matter
Up:
Febrl - Freely extensible
Next:
1. Acknowledgments
Release 0.3.1, documentation updated on July 1, 2005.