The following files are provided with the current distribution of Febrl (Version 0.3).
ANUOS-1.2.txt LICENSE.txt README.txt address.py classification.py classificationTest.py comparison.py comparisonTest.py dataset.py datasetTest.py date.py dateTest.py encode.py encodeTest.py febrl.py geocoding.py indexing.py lap.py lapTest.py lookup.py lookupTest.py mymath.py mymathTest.py name.py output.py parallel.py phonenum.py phonenumTest.py project-deduplicate.py project-geocode.py project-linkage.py project-standardise.py qgramindex.py qgramindexTest.py randomselect.py
simplehmm.py simplehmmTest.py standardisation.py stringcmp.py stringcmpTest.py tagdata.py trainhmm.py
data
directory contains look-up tables,
correction-lists and frequency-tables.
address_corr.lst address_misc.tbl address_qual.tbl country.tbl givenname_f.tbl givenname_f_freq.csv givenname_m.tbl givenname_m_freq.csv institution_type.tbl locality_name_act.tbl locality_name_nsw.tbl name_corr.lst name_misc.tbl name_prefix.tbl post_address.tbl postcode_act.tbl postcode_act_freq.csv postcode_centroids.csv postcode_nsw.tbl postcode_nsw_freq.csv saints.tbl suburb_act_freq.csv suburb_nsw_freq.csv surname.tbl surname_act_freq.csv surname_nsw_freq.csv territory.tbl title.tbl unit_type.tbl wayfare_type.tbl
dsgen
directory contains the data set generator
generate.py and all its associated files. Frequency
files are stored in a sub-directory dsgen/data
.
README.txt dataset1.csv dataset2.csv dataset3.csv dataset4a.csv dataset4b.csv generate.py
data/address1-freq.csv data/address2-freq.csv data/age-freq.csv data/givenname-freq.csv data/givenname-misspell.tbl data/postcode-freq.csv data/state-freq.csv data/streetnumber-freq.csv data/suburb-freq.csv data/suburb-misspell.tbl data/surname-freq.csv data/surname-misspell.tbl
geocode
directory contains programs and files needed
for the Febrl geocoding system.
get-neighbour-regions.py gnaffunctions.py pc-neighbours-1.txt pc-neighbours-2.txt process-gnaf.py reverse-gnaf.py suburb-neighbours-1.txt suburb-neighbours-2.txt testaddresses-small.txt
hmm
directory contains some example hidden Markov
model training data sets ('.csv'
files) and some example
HMMs derived from them. The training data has been derived from
files of NSW death certificates and MDC (Midwives Data
Collection) data. It should work adequately with most Australian
name data and NSW address data. The tagging look-up tables in
the data
directory will need to be modified to suit
other states of Australia or other countries. In future versions
we plan to include look-up tables and example training sets
which are suitable for initial use anywhere in Australia. We are
also happy to include example files for other countries if these
are contributed.
address-absdiscount.hmm address-laplace.hmm address-sample-training-data.csv address.hmm geocode-nsw-address.hmm hmm-states.txt name-absdiscount.hmm name-laplace.hmm name-sample-training-data.csv name.hmm