Febrl
- Freely extensible biomedical record linkage
Previous:
Front Matter
Up:
Febrl - Freely extensible
Next:
1. Acknowledgments
Contents
Front Matter
1. Acknowledgments
2. Introduction
2.1 Performance
3. Data Cleaning and Record Linkage
4. System Overview
5. Configuration and Running Febrl using a Module derived from 'project.py'
6. Data Cleaning and Standardisation
6.1 Name and Address Cleaning and Standardisation
6.1.1 Step 1: Cleaning
6.1.2 Step 2: Tagging
6.1.3 Step 3: Segmentation
6.1.4 Word Spilling
6.2 Output Fields
6.3 Name Cleaning and Standardisation using a Rules Based Approach
6.4 Name Cleaning and Standardisation using a Hidden Markov Model Based Approach
6.5 Address Cleaning and Standardisation using a Hidden Markov Model Based Approach
6.6 Date Cleaning and Standardisation
6.7 Field Passing
6.8 Record Cleaning and Standardisation
6.9 Starting a Standardisation Process
7. Hidden Markov Models for Data Standardisation
7.1 Hidden Markov Model Implementation Module 'simplehmm.py'
8. Hidden Markov Model Training
8.1 Program 'tagdata.py'
8.2 Program 'trainhmm.py'
9. Record Linkage and Deduplication
9.1 Indexing
9.1.1 Block Indexing
9.1.2 Sorting Indexing
9.1.3 Bigram Indexing
9.2 Field Comparison Functions
9.2.1 Frequency Dependent Weight Calculation
9.2.2 Exact String Comparison '
FieldComparatorExactString
'
9.2.3 Truncated String Comparison '
FieldComparatorTruncateString
'
9.2.4 Approximate String Comparison '
FieldComparatorApproxString
'
9.2.5 Encoded String Comparison '
FieldComparatorEncodeString
'
9.2.6 Keying Difference Comparison '
FieldComparatorKeyDiff
'
9.2.7 Numeric Comparison with Percentage Tolerance '
FieldComparatorNumericPerc
'
9.2.8 Numeric Comparison with Absolute Tolerance '
FieldComparatorNumericAbs
'
9.2.9 Date Comparison with Day Tolerance '
FieldComparatorDate
'
9.2.10 Age Comparison with Percentage Tolerance '
FieldComparatorAge
'
9.2.11 Time Comparison with Minute Tolerance '
FieldComparatorTime
'
9.2.12 Distance Comparison with Kilometer Tolerance '
FieldComparatorDistance
'
9.3 Record Comparator
9.4 Example Field and Record Comparator Initialisation
9.5 Classification
9.5.1 Fellegi and Sunter Classifier
9.5.2 Flexible Classifier
9.6 Starting a Linkage or Deduplication Process
10. Output
10.1 Record Pair One-To-One Assignment Restrictions
11. Auxiliary Programs
11.1 Program 'randomselect.py'
11.2 Database generator Program 'generate.py'
12. Data Set Access
12.1 COL Data Set Implementation
12.2 CSV Data Set Implementation
12.3 SQL Data Set Implementation
12.4 Shelve Data Set Implementation
12.5 Memory Data Set Implementation
13. Look-up and Frequency Tables
13.1 Correction List
13.2 Tagging Look-up Table
13.3 Frequency Look-up Table
13.4 Geographic Location Look-up Table
14. Logging and Verbose Output
15. Installation
16. Parallelism
A. Hidden Markov Model States
B. List of Tags
C. Rule-based Name Segmentation
C.1 Input
C.2 Process Overview
C.2.1 Step 1: Allocating the elements into one of the five sub-lists
C.2.2 Step 2: Parse each sub-list and assign into appropriate output name component
C.3 Output
D. Manifest
E. To-Do: Outstanding Development Tasks, Possible Additions and Enhancements
E.1 Data Cleaning and Standardisation
E.2 Record Linkage and Deduplication
E.3 Febrl System
F. Version History
G. Support Arrangements
H. ANU - Open Source License
Bibliography
Index
About this document ...
Febrl
- Freely extensible biomedical record linkage
Previous:
Front Matter
Up:
Febrl - Freely extensible
Next:
1. Acknowledgments
Release 0.2.2, documentation updated on November 13, 2003.