Index

Index

a | b | c | d | e | f | g | h | i | j | l | m | n | o | p | q | r | s | t | u | w | z

A

address level match
address.py, [Link]
agreement weight
ANU Data Mining Group
ANU Open Source License, [Link], [Link]
approximate string comparator, [Link]

assignment, [Link] Auction algorithm one-to-one, [Link]
AutoMatch, [Link]
AutoStan
awk

B

bag distance
Berkeley database
bigram, [Link]
bigram index
biomedical research
blocking index, [Link], [Link]

blocking technique
blocking variable
bootstrapping, [Link]
bounding box
bsddb3

C

categorical attributes
Centre for Epidemiology and Research
centroid
classifier, [Link] Fellegi and Sunter flexible
clerical review, [Link], [Link], [Link]
clusters of workstations

COL data set
comma separated value, [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link]
comparison.py
compression
consent
correction list, [Link], [Link], [Link], [Link], [Link]
CSV data set

D

data cleaning, [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link]
data integrity
data items
data matching
data mining, [Link], [Link]
data quality
data scrubbing
data segmentation
data standardisation, [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link]
data tagging
data warehouse
dataset.py

date format string, [Link]
date parsing
date standardisation
date.py, [Link]
deduplication, [Link], [Link], [Link]
difflib
directory separator, [Link], [Link]
dirsep, [Link], [Link]
disagreement weight
Double-Metaphone, [Link]
download

E

edit distance
encode.py
epidemiological studies

ethics committee
ETL

F

Fellegi and Sunter classifier
field comparison function, [Link], [Link], [Link], [Link]
field passing, [Link]
fileanalysis.py

fixed column fields, [Link]
frequency table, [Link], [Link], [Link]
fuzzy matching
fuzzy technique

G

G-NAF
generate.py
geocode reference data set
geocode table, [Link]
geocoding

geographical information system
GIS, [Link], [Link], [Link]
gnaffunctions.py, [Link]
greedy matching
grep/agrep

H

health data
health research
hidden Markov model, [Link], [Link], [Link], [Link], [Link] smoothing, [Link] absolute discounting

hidden Markov model (continued)

smoothing (continued)

Laplace

training, [Link]

Viterbi algorithm, [Link], [Link], [Link]

I

indexing.py
information retrieval

installation
inverted index

J

Jaro	join operation

L

list washing
logging.py, [Link]

look-up table, [Link], [Link], [Link], [Link], [Link]
lookup.py, [Link]

M

m-probability
machine learning, [Link], [Link], [Link]
mailing lists
match weight, [Link]
maximum likelihood estimate
Memory data set
merge/purge processing

Microsoft Windows
missing weight
Mozilla Public License, [Link]
MPI
MS-DOS
multiprocessor
MySQL, [Link]

N

name.py, [Link]
name_corr.lst
name_misc.tbl
neighbour region table, [Link]

New South Wales Department of Health, [Link]
Newcombe and Kennedy
NYSIIS, [Link], [Link]

O

object identity
open source

output dataset
output field, [Link], [Link], [Link], [Link], [Link], [Link], [Link], [Link]

P

parallel computing
parallelism
permwinkler
personal attributes
phonenum.py, [Link]
Phonex, [Link]
pickle, [Link]
pivot year
PostgreSQL
PQSQL data set

pre-processing, [Link], [Link]
privacy
probabilistic linkage, [Link]
process-gnaf.py, [Link], [Link], [Link], [Link], [Link], [Link], [Link]
project-geocode.py
project.py, [Link], [Link], [Link], [Link]
Pypar
Python, [Link], [Link] dictionary, [Link]

Q

qgramindex.py

R

randomselect.py, [Link]
record comparator, [Link], [Link], [Link], [Link]
record linkage, [Link]
record pair

record standardisation
regular expressions
results
rule-based, [Link], [Link]

S

scalar attributes
seqmatch
shelve, [Link]
Shelve data set
shelve.py, [Link]
simplehmm.py, [Link], [Link]
simplehmmTest.py
Sleepycat Software
SNOBOL
sorting, [Link]

sorting index
sortwinkler
Soundex, [Link], [Link], [Link]
speedup
SQL, [Link], [Link]
SQL data set, [Link]
standardisation.py, [Link], [Link], [Link], [Link], [Link]
street level match
stringcmp.py
supercomputer

T

tag, [Link], [Link], [Link], [Link]
tagdata.py, [Link], [Link], [Link]
tagging table, [Link]
telephone standardisation
territory.tbl
text indexing

text segmentation
trainhmm.py, [Link], [Link]
training
training data, [Link]
transitive closure

U

u-probability
unique identifier

Unix

W

weight vector, [Link]
Winkler, [Link], [Link]

word spilling, [Link], [Link], [Link], [Link]

Z

zlib