Field comparator functions need to be initialised (or
constructed) in order to be able to use them. In the following
examples, we assume that a data set mydata_1
has the fields
givenname
, surname
, age
and postcode
, and
a second data set mydata_2
has the fields gname
,
sname
, dob
, mob
, yob
(day, month and year
of birth) and pcode
. Additionally, a frequency table
surname_freq
is available and has been loaded. The following
examples illustrate how to set up different field comparators that
can be used to compute weight vectors using a record comparator (which
is initialised at the end of the example code).
# ==================================================================== surname_exact = FieldComparatorExactString(fields_a='surname', fields_b='sname', m_prob=0.95, u_prob=0.001, missing_weight=0.0, frequency_table=surname_freq, freq_table_max_weight=20.0, freq_table_min_weight=-20.0) surname_jaro = FieldComparatorApproxString(fields_a='surname', fields_b='sname', m_prob=0.95, u_prob=0.001, missing_weight=0.0, frequency_table=surname_freq, freq_table_max_weight=20.0, freq_table_min_weight=-20.0, compare_method='jaro') givenname_trunc = FieldComparatorTruncateString(fields_a='givenname', fields_b='gname', m_prob=0.90, u_prob=0.02, missing_weight=0.0, max_string_len=4) postcode_keydiff = FieldComparatorKeyDiff(fields_a='postcode', fields_b='pcode', m_prob=0.98, u_prob=0.001, missing_weight=0.0, max_key_diff=1)
postcode_distance = FieldComparatorDistance(fields_a='postcode', fields_b='pcode', m_prob=0.98, u_prob=0.001, missing_weight=0.0, geocode_table=postcode_geocode, max_distance=42.0) age = FieldComparatorAge(fields_a='age', fields_b=['dob','mob','yob'], m_probability_day=0.9, u_probability_day=0.01, m_probability_month=0.98, u_probability_month=0.0001, m_probability_year=0.95, u_probability_year=0.001, missing_weight=0.0, max_perc_diff=10, fix_date='20000101') field_comparisons = [surname_exact, surname_jaro, givenname_trunc, postcode_keydiff, postcode_distance, age] record_comparator = RecordComparator(mydata_1, mydata_2, field_comparisons)