9.5.1 Fellegi and Sunter Classifier

The classical Fellegi and Sunter classifier [13] simply sums all the log2-weights in a weight vector (as calculated by a RecordComparator as discussed in Section 9.3), and then uses two thresholds to classify a record pair into one of the three classes links, non-links or possible links. The results of a classification are stored in a data structure, which can then be used to produce various output forms as presented in Chapter 11.

When a Fellegi and Sunter classifier is initialised, the following arguments need to be given.

The following example shows how a Fellegi and Sunter classifier can be initialised.

# ====================================================================

f_s_classifier = FellegiSunterClassifier(name = 'My F & S classifier',
                                    dataset_a = mydata_1,
                                    dataset_b = mydata_2,
                              lower_threshold = 10.0,
                              upper_threshold = 50.0)