If a frequency table is given for a certain field comparator that supports frequency dependent weight calculation, both agreement and disagreement weight will be calculated using the frequencies of the values of the input fields that are compared, if they are found in the frequency table.
Given a field value is listed in the given frequency table, its count and the sum of all entries in the frequency table are used to compute the frequency probability of this entry.
freq_table_max_weight
(see
Section 9.2 above) is set and the
calculated agreement weight is larger than this value, it is limited
to the value of freq_table_max_weight
. Similarly, if the
freq_table_min_weight
attribute is given and the calculated
disagreement weight is smaller than this value, the disagreement
weight will be limited to the value of
freq_table_min_weight
.
If a value is not found in a frequency table, the M- and U-probabilities are used to compute generic agreement and disagreement weights as described in Section 9.2.
For each field value (one from each record) we now have an agreement and a disagreement weight, and the minimum of the two agreement weights will be selected if the values are the same, and the maximum of the disagreement weight if the two values differ. Partial agreement weights are then calculated as described in the sections below.
The following sections contain descriptions of the field comparison functions currently provided by Febrl. Improved and additional functions will be added in later versions of this software.