Index


a | b | c | d | e | f | g | h | i | j | l | m | n | o | p | r | s | t | u | w

A

address.py, [Link]
agreement weight
ANU Data Mining Group
ANU Open Source License, [Link], [Link]
approximate string comparator, [Link]
assignment, [Link]
assignment (continued)
Auction algorithm
one-to-one, [Link]
AutoMatch, [Link]
AutoStan
awk


B

Berkeley database
bigram, [Link]
bigram index
biomedical research
blocking index, [Link], [Link]
blocking technique
blocking variable
bootstrapping, [Link]
bsddb3


C

categorical attributes
Centre for Epidemiology and Research
classifier, [Link]
Fellegi and Sunter
flexible
clerical review, [Link], [Link], [Link]
clusters of workstations
COL data set
comma separated value, [Link], [Link], [Link]
comparison.py
consent
correction list, [Link], [Link], [Link], [Link], [Link]
CSV data set


D

data cleaning, [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link]
data integrity
data items
data matching
data mining, [Link], [Link]
data quality
data scrubbing
data segmentation
data standardisation, [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link]
data tagging
data warehouse
dataset.py
date format string, [Link]
date parsing
date standardisation
date.py, [Link]
deduplication, [Link], [Link], [Link]
difflib
disagreement weight
Double-Metaphone, [Link]
download


E

edit distance
encode.py
epidemiological studies
ethics committee
ETL


F

Fellegi and Sunter
classifier
field comparison function, [Link], [Link], [Link], [Link]
field passing
fixed column fields, [Link]
frequency table, [Link], [Link], [Link]
fuzzy technique


G

generate.py
geocode table, [Link]
greedy matching
grep/agrep


H

health data
health research
hidden Markov model, [Link], [Link], [Link], [Link], [Link]
smoothing, [Link]
absolute discounting
hidden Markov model (continued)
smoothing (continued)
Laplace
states
training, [Link]
Viterbi algorithm, [Link], [Link], [Link]
histogram


I

indexing.py
information retrieval
installation


J

Jaro
join operation


L

list washing
look-up table, [Link], [Link], [Link], [Link], [Link]
lookup.py, [Link]


M

m-probability
machine learning, [Link], [Link], [Link]
mailing lists
match weight, [Link]
maximum likelihood estimate
Memory data set
merge/purge processing
Microsoft Windows
missing weight
Mozilla Public License, [Link]
MPI
MS-DOS
multiprocessor
MySQL, [Link]


N

name.py, [Link]
name_corr.lst
name_misc.tbl
New South Wales Department of Health, [Link]
Newcombe and Kennedy
NYSIIS, [Link], [Link]


O

object identity
open source
output dataset
output field, [Link], [Link], [Link], [Link], [Link], [Link], [Link]


P

parallel computing
parallelism
personal attributes
Phonex, [Link]
pivot year
pre-processing, [Link], [Link]
privacy
probabilistic linkage, [Link]
project.py, [Link], [Link], [Link], [Link]
Pypar
Python, [Link], [Link]
dictionary, [Link]


R

randomselect.py, [Link]
record comparator, [Link], [Link], [Link], [Link]
record linkage, [Link]
record pair
record standardisation
regular expressions
results
rule-based, [Link], [Link]


S

scalar attributes
seqmatch
Shelve data set
shelve.py, [Link]
simplehmm.py, [Link], [Link]
simplehmmTest.py
Sleepycat Software
SNOBOL
sorting, [Link]
sorting index
Soundex, [Link], [Link], [Link]
speedup
SQL, [Link], [Link]
SQL data set
standardisation.py, [Link], [Link], [Link], [Link], [Link]
stringcmp.py
supercomputer


T

tag, [Link], [Link], [Link], [Link]
tagdata.py, [Link], [Link], [Link]
tagging table, [Link]
territory.tbl
text indexing
text segmentation
trainhmm.py, [Link], [Link]
training
training data, [Link]
transitive closure


U

u-probability
unique identifier
Unix


W

weight vector, [Link]
Winkler
word spilling, [Link], [Link], [Link], [Link]