D. Manifest

The following files are provided with the current distribution of Febrl (Version 0.2.2).

The main directory contains the Python programs, license files and documentation. Note that a PDF and a compressed (gzipped) PostScript version of the manual are available for download from the Febrl web site but they are not included in the standard distribution due to their large sizes.

      ANUOS_v1.1.txt
      LICENSE.txt
      README.txt
      address.py
      classification.py
      classificationTest.py
      comparison.py
      comparisonTest.py
      dataset.py
      datasetTest.py
      date.py
      dateTest.py
      encode.py
      encodeTest.py
      febrl.py
      indexing.py
      lap.py
      lapTest.py
      lookup.py
      lookupTest.py
      mymath.py
      mymathTest.py
      name.py
      output.py
      parallel.py
      project-deduplicate.py
      project-linkage.py
      project-standardise.py
      randomselect.py
      simplehmm.py
      simplehmmTest.py
      standardisation.py
      stringcmp.py
      stringcmpTest.py

      tagdata.py
      tcsv.py
      trainhmm.py

The hmm/ directory contains some example hidden Markov model training data sets ('.csv' files) and some example HMMs derived from them. The training data has been derived from files of NSW death certificates and MDC (Midwives Data Collection) data. It should work adequately with most Australian name data and NSW address data. The tagging look-up tables in the data/ directory will need to be modified to suit other states of Australia or other countries. In future versions we plan to include look-up tables and example training sets which are suitable for initial use anywhere in Australia. We are also happy to include example files for other countries if these are contributed.

      address-absdiscount.hmm
      address-laplace.hmm
      address-sample-training-data.csv
      address.hmm
      hmm-states.txt
      name-absdiscount.hmm
      name-laplace.hmm
      name-sample-training-data.csv
      name.hmm

The data/ directory contains look-up tables, correction-lists and frequency-tables.

      address_corr.lst
      address_misc.tbl
      address_qual.tbl
      country.tbl
      givenname_f.tbl
      givenname_f_freq.csv
      givenname_m.tbl
      givenname_m_freq.csv
      institution_type.tbl
      name_corr.lst
      name_misc.tbl
      name_prefix.tbl
      post_address.tbl
      postcode_centroids.csv
      saints.tbl
      surname.tbl
      territory.tbl
      title.tbl
      unit_type.tbl
      wayfare_type.tbl

The dbgen/ directory contains the database generator generate.py and all its associated files. Frequency files are stored in a sub-directory dbgen/data/.

      README.txt
      dataset1.csv
      dataset2.csv
      dataset3.csv
      dataset4a.csv
      dataset4b.csv
      generate.py
      data/address1.csv
      data/address2.csv
      data/givenname.csv
      data/postcode.csv
      data/state.csv
      data/streetnumber.csv
      data/suburb.csv
      data/surname.csv