9.4 Example Field and Record Comparator Initialisation

Field comparator functions need to be initialised (or constructed) in order to be able to use them. In the following examples, we assume that a data set mydata_1 has the fields givenname, surname, age and postcode, and a second data set mydata_2 has the fields gname, sname, dob, mob, yob (day, month and year of birth) and pcode. Additionally, a frequency table surname_freq is available and has been loaded. The following examples illustrate how to set up different field comparators that can be used to compute weight vectors using a record comparator (which is initialised at the end of the example code).

# ====================================================================

surname_exact = 
  FieldComparatorExactString(fields_a='surname',
                             fields_b='sname',
                             m_prob=0.95, u_prob=0.001,
                             missing_weight=0.0,
                             frequency_table=surname_freq,
                             freq_table_max_weight=20.0,
                             freq_table_min_weight=-20.0)

surname_jaro = 
  FieldComparatorApproxString(fields_a='surname',
                              fields_b='sname',
                              m_prob=0.95, u_prob=0.001,
                              missing_weight=0.0,
                              frequency_table=surname_freq,
                              freq_table_max_weight=20.0,
                              freq_table_min_weight=-20.0,
                              compare_method='jaro')

givenname_trunc = 
  FieldComparatorTruncateString(fields_a='givenname',
                                fields_b='gname',
                                m_prob=0.90, u_prob=0.02,
                                missing_weight=0.0,
                                max_string_len=4)

postcode_keydiff = 
  FieldComparatorKeyDiff(fields_a='postcode',
                         fields_b='pcode',
                         m_prob=0.98, u_prob=0.001,
                         missing_weight=0.0,
                         max_key_diff=1)
postcode_distance =
  FieldComparatorDistance(fields_a='postcode',
                          fields_b='pcode',
                          m_prob=0.98, u_prob=0.001,
                          missing_weight=0.0,
                          geocode_table=postcode_geocode,
                          max_distance=42.0)

age = FieldComparatorAge(fields_a='age',
                         fields_b=['dob','mob','yob'],
                         m_probability_day=0.9,
                         u_probability_day=0.01,
                         m_probability_month=0.98,
                         u_probability_month=0.0001,
                         m_probability_year=0.95,
                         u_probability_year=0.001,
                         missing_weight=0.0,
                         max_perc_diff=10,
                         fix_date='20000101')

field_comparisons =
  [surname_exact, surname_jaro, givenname_trunc, postcode_keydiff,
   postcode_distance, age]

record_comparator =
  RecordComparator(mydata_1, mydata_2, field_comparisons)