The following files are provided with the current distribution of Febrl (Version 0.3).
ANUOS-1.2.txt
LICENSE.txt
README.txt
address.py
classification.py
classificationTest.py
comparison.py
comparisonTest.py
dataset.py
datasetTest.py
date.py
dateTest.py
encode.py
encodeTest.py
febrl.py
geocoding.py
indexing.py
lap.py
lapTest.py
lookup.py
lookupTest.py
mymath.py
mymathTest.py
name.py
output.py
parallel.py
phonenum.py
phonenumTest.py
project-deduplicate.py
project-geocode.py
project-linkage.py
project-standardise.py
qgramindex.py
qgramindexTest.py
randomselect.py
simplehmm.py
simplehmmTest.py
standardisation.py
stringcmp.py
stringcmpTest.py
tagdata.py
trainhmm.py
data directory contains look-up tables,
correction-lists and frequency-tables.
address_corr.lst
address_misc.tbl
address_qual.tbl
country.tbl
givenname_f.tbl
givenname_f_freq.csv
givenname_m.tbl
givenname_m_freq.csv
institution_type.tbl
locality_name_act.tbl
locality_name_nsw.tbl
name_corr.lst
name_misc.tbl
name_prefix.tbl
post_address.tbl
postcode_act.tbl
postcode_act_freq.csv
postcode_centroids.csv
postcode_nsw.tbl
postcode_nsw_freq.csv
saints.tbl
suburb_act_freq.csv
suburb_nsw_freq.csv
surname.tbl
surname_act_freq.csv
surname_nsw_freq.csv
territory.tbl
title.tbl
unit_type.tbl
wayfare_type.tbl
dsgen directory contains the data set generator
generate.py and all its associated files. Frequency
files are stored in a sub-directory dsgen/data.
README.txt
dataset1.csv
dataset2.csv
dataset3.csv
dataset4a.csv
dataset4b.csv
generate.py
data/address1-freq.csv
data/address2-freq.csv
data/age-freq.csv
data/givenname-freq.csv
data/givenname-misspell.tbl
data/postcode-freq.csv
data/state-freq.csv
data/streetnumber-freq.csv
data/suburb-freq.csv
data/suburb-misspell.tbl
data/surname-freq.csv
data/surname-misspell.tbl
geocode directory contains programs and files needed
for the Febrl geocoding system.
get-neighbour-regions.py
gnaffunctions.py
pc-neighbours-1.txt
pc-neighbours-2.txt
process-gnaf.py
reverse-gnaf.py
suburb-neighbours-1.txt
suburb-neighbours-2.txt
testaddresses-small.txt
hmm directory contains some example hidden Markov
model training data sets ('.csv' files) and some example
HMMs derived from them. The training data has been derived from
files of NSW death certificates and MDC (Midwives Data
Collection) data. It should work adequately with most Australian
name data and NSW address data. The tagging look-up tables in
the data directory will need to be modified to suit
other states of Australia or other countries. In future versions
we plan to include look-up tables and example training sets
which are suitable for initial use anywhere in Australia. We are
also happy to include example files for other countries if these
are contributed.
address-absdiscount.hmm
address-laplace.hmm
address-sample-training-data.csv
address.hmm
geocode-nsw-address.hmm
hmm-states.txt
name-absdiscount.hmm
name-laplace.hmm
name-sample-training-data.csv
name.hmm