The classical Fellegi and Sunter classifier [13] simply sums
all the log2-weights in a weight vector (as calculated by a
RecordComparator
as discussed in
Section 9.3), and then uses two
thresholds to classify a record pair into one of the three classes
links, non-links or possible links. The results of a classification
are stored in a data structure, which can then be used to produce
various output forms as presented in Chapter 11.
When a Fellegi and Sunter classifier is initialised, the following arguments need to be given.
name
dataset_a
dataset_b
dataset_a
in a deduplication process, but
most likely a different data set in a linkage process (until
different parts of the same data set are to be linked).
lower_threshold
upper_threshold
The following example shows how a Fellegi and Sunter classifier can be initialised.
# ==================================================================== f_s_classifier = FellegiSunterClassifier(name = 'My F & S classifier', dataset_a = mydata_1, dataset_b = mydata_2, lower_threshold = 10.0, upper_threshold = 50.0)