Index


a | b | c | d | e | f | g | h | i | j | l | m | n | o | p | q | r | s | t | u | w | z

A

address level match
address.py, [Link]
agreement weight
ANU Data Mining Group
ANU Open Source License, [Link], [Link]
approximate string comparator, [Link]
assignment, [Link]
Auction algorithm
one-to-one, [Link]
AutoMatch, [Link]
AutoStan
awk

B

bag distance
Berkeley database
bigram, [Link]
bigram index
biomedical research
blocking index, [Link], [Link]
blocking technique
blocking variable
bootstrapping, [Link]
bounding box
bsddb3

C

categorical attributes
Centre for Epidemiology and Research
centroid
classifier, [Link]
Fellegi and Sunter
flexible
clerical review, [Link], [Link], [Link]
clusters of workstations
COL data set
comma separated value, [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link]
comparison.py
compression
consent
correction list, [Link], [Link], [Link], [Link], [Link]
CSV data set

D

data cleaning, [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link]
data integrity
data items
data matching
data mining, [Link], [Link]
data quality
data scrubbing
data segmentation
data standardisation, [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link]
data tagging
data warehouse
dataset.py
date format string, [Link]
date parsing
date standardisation
date.py, [Link]
deduplication, [Link], [Link], [Link]
difflib
directory separator, [Link], [Link]
dirsep, [Link], [Link]
disagreement weight
Double-Metaphone, [Link]
download

E

edit distance
encode.py
epidemiological studies
ethics committee
ETL

F

Fellegi and Sunter
classifier
field comparison function, [Link], [Link], [Link], [Link]
field passing, [Link]
fileanalysis.py
fixed column fields, [Link]
frequency table, [Link], [Link], [Link]
fuzzy matching
fuzzy technique

G

G-NAF
generate.py
geocode reference data set
geocode table, [Link]
geocoding
geographical information system
GIS, [Link], [Link], [Link]
gnaffunctions.py, [Link]
greedy matching
grep/agrep

H

health data
health research
hidden Markov model, [Link], [Link], [Link], [Link], [Link]
smoothing, [Link]
absolute discounting
hidden Markov model (continued)
smoothing (continued)
Laplace
states
training, [Link]
Viterbi algorithm, [Link], [Link], [Link]
histogram

I

indexing.py
information retrieval
installation
inverted index

J

Jaro
join operation

L

list washing
logging.py, [Link]
look-up table, [Link], [Link], [Link], [Link], [Link]
lookup.py, [Link]

M

m-probability
machine learning, [Link], [Link], [Link]
mailing lists
match weight, [Link]
maximum likelihood estimate
Memory data set
merge/purge processing
Microsoft Windows
missing weight
Mozilla Public License, [Link]
MPI
MS-DOS
multiprocessor
MySQL, [Link]

N

name.py, [Link]
name_corr.lst
name_misc.tbl
neighbour region table, [Link]
New South Wales Department of Health, [Link]
Newcombe and Kennedy
NYSIIS, [Link], [Link]

O

object identity
open source
output dataset
output field, [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link]

P

parallel computing
parallelism
permwinkler
personal attributes
phonenum.py, [Link]
Phonex, [Link]
pickle, [Link]
pivot year
PostgreSQL
PQSQL data set
pre-processing, [Link], [Link]
privacy
probabilistic linkage, [Link]
process-gnaf.py, [Link], [Link], [Link], [Link], [Link], [Link], [Link]
project-geocode.py
project.py, [Link], [Link], [Link], [Link]
Pypar
Python, [Link], [Link]
dictionary, [Link]

Q

qgramindex.py

R

randomselect.py, [Link]
record comparator, [Link], [Link], [Link], [Link]
record linkage, [Link]
record pair
record standardisation
regular expressions
results
rule-based, [Link], [Link]

S

scalar attributes
seqmatch
shelve, [Link]
Shelve data set
shelve.py, [Link]
simplehmm.py, [Link], [Link]
simplehmmTest.py
Sleepycat Software
SNOBOL
sorting, [Link]
sorting index
sortwinkler
Soundex, [Link], [Link], [Link]
speedup
SQL, [Link], [Link]
SQL data set, [Link]
standardisation.py, [Link], [Link], [Link], [Link], [Link]
street level match
stringcmp.py
supercomputer

T

tag, [Link], [Link], [Link], [Link]
tagdata.py, [Link], [Link], [Link]
tagging table, [Link]
telephone standardisation
territory.tbl
text indexing
text segmentation
trainhmm.py, [Link], [Link]
training
training data, [Link]
transitive closure

U

u-probability
unique identifier
Unix

W

weight vector, [Link]
Winkler, [Link], [Link]
word spilling, [Link], [Link], [Link], [Link]

Z

zlib