If a frequency table is given for a certain field comparator that supports frequency dependent weight calculation, the agreement weight will be calculated using the frequency of the value of the input fields that are compared, if they are found in the frequency table.
If a field value is found in the given frequency table, its count and the sum of all entries in the frequency table are used to compute the frequency probability of this entry.
If the attribute freq_table_max_weight
(see
Section 9.2 above) is set and the
calculated agreement weight is larger than this value, it is limited
to the value of freq_table_max_weight
.
If a value is not found in a frequency table, the M- and U-probabilities are used to compute generic agreement and disagreement weights as described in Section 9.2.
If the field values differ, agreement and disagreement weights are still calculated (and then used to calculate partial agreement weights as described in the following sections). While the disagreement weight is never calculated using frequency tables, frequency dependent agreement weights will be calculated if a frequency table is available and the values are found in this table.
Different input values might have different frequencies, resulting in different agreement weights as shown in the above formula. We select the minimum of the two frequency dependent agreement weights, as well as the generic disagreement weight, to calculate partial agreement weights as described in the sections below.
The following sections contain descriptions of the field comparison functions currently provided by Febrl. Improved and additional functions will be added in later versions of this software.