## 9.5.2 Flexible Classifier

This flexible classifier allows different methods to be used to calculate the final matching weight for a weight vector (as calculated by a `RecordComparator` as discussed in Section 9.3). Similar to the Fellegi and Sunter classifier, two thresholds are used to classify a record pair into one of the three classes links, non-links or possible links. The results of a classification are stored in a data structure, which can then be used to produce various output forms as presented in Chapter 11.

Instead of simply summing all weights in a weight vector, this flexible classifier allows a flexible definition of the final weight calculation by defining tuples containing a function and elements of the weight vector upon which the function is applied. The final weight is then calculated using another function that needs to be defined by the user.

The following functions can currently be used within the flexible classifier:

• `min`
Take the minimum value in the selected weight vector elements.
• `max`
Take the maximum value in the selected weight vector elements.
• `add`
Add the values in the selected weight vector elements.
• `mult`
Multiply the values in the selected weight vector elements.
• `avrg`
Calculate the average of the values in the selected weight. vector elements

Weight vector elements are selected by giving the desired indexes (starting from 0) in a Python list, e.g. `[0,1,4]` selects the first two and the fifth field comparison weights. When initialising a flexible classifier, the argument `calculate` needs to be set to a list made of tuples with functions and weight vector elements as shown in the example below.

The final weight can then be calculated by again using one of the functions `'min'`, `'max'`, `'add'`, `'mult'`, and `'avrg'` given above. The argument `final_funct` has to be used for this when a flexible classifier is initialised.

Let's make an example. Assume we have weight vectors that contain weights calculated by eight different field comparison functions (as explained in Section 9.2). We would like to calculate the final weight as being the average of 1) the sum of the first four weights, 2) the maximal value of weights five and six, and 3) the minimum of weights seven and eight. The corresponding flexible classifier can then be initialised as shown in the following example code.

```# ====================================================================

flex_classifier = FlexibleClassifier(name = 'My flexible classifier',
dataset_a = mydata_1,
dataset_b = mydata_2,
lower_threshold = 10.0,
upper_threshold = 50.0,
('max', [4,5]),
('min', [6,7])],
final_funct = 'avrg')
```

Note that it is possible to use a weight in more than just one of the calculated intermediate weights. Alternatively it is also possible not to use a weight. It is important though that weight vectors must have as much elements as are used in the `calculate` definitions (i.e. one should not use definitions with indexes larger than the lengths of the weight vectors).

When a flexible classifier is initialised, the following arguments need to be given.

• `name`
A name for a the classifier. This should be a short string.
• `dataset_a`
A reference to a data set. This must be the same data set as the first data set defined within a record comparator.
• `dataset_b`
A reference to a data set, which must be the same as the second data set defined within a record comparator. This data set will be the same as `dataset_a` in a deduplication process, but most likely a different data set in a linkage process (until different parts of the same data set are to be linked).
• `lower_threshold`
A number, which is the lower threshold for the classifier.
• `upper_threshold`
A number, which is the upper threshold for the classifier. It must be larger than the lower threshold.
• `calculate`
The definitions for the calculation of intermediate results using selected elements of the weight vector. This must be a list containing tuples, with each tuple being made of a function (one of `'min'`, `'max'`, `'add'`, `'mult'`, or `'avrg'`) and a list of the weight vector elements to be used (index numbers starting with 0).
• `final_weight`
The function to be used to calculate the final weight. Must be one of `'min'`, `'max'`, `'add'`, `'mult'`, or `'avrg'`.