The classical Fellegi and Sunter classifier [13] simply sums
all the log2-weights in a weight vector (as calculated by a
RecordComparator as discussed in
Section 9.3), and then uses two
thresholds to classify a record pair into one of the three classes
links, non-links or possible links. The results of a classification
are stored in a data structure, which can then be used to produce
various output forms as presented in Chapter 11.
When a Fellegi and Sunter classifier is initialised, the following arguments need to be given.
name
dataset_a
dataset_b
dataset_a in a deduplication process, but
most likely a different data set in a linkage process (until
different parts of the same data set are to be linked).
lower_threshold
upper_threshold
The following example shows how a Fellegi and Sunter classifier can be initialised.
# ====================================================================
f_s_classifier = FellegiSunterClassifier(name = 'My F & S classifier',
dataset_a = mydata_1,
dataset_b = mydata_2,
lower_threshold = 10.0,
upper_threshold = 50.0)