'This should never happen!'in which case a record is not standardised properly.
II(initial) tag used in name standardisation. This will allow HMMs to more easily classify single-letter words correctly, and then allow their transformation at the output stage. For example, it is hazardous to transform every instance of the string
' r 'into
' road 'during the initial cleaning stage, but it is quite safe to transform any instance of
'road'in the output stage where those instances have been classified as the wayfare type.
unit_numberoutput field starts with
'unit'(as in, for example,
'unit42', then remove the
'unit'prefix from the
unit_numberfield and make the
unit_typefield equal to
'unit'. There is no need to invent a new rule specification language to implement this - Python is already easy enough for users of Febrl to write their own transformation rules. All the user needs to do is specify a function name for the transformation of the output field. However, some more object-oriented refactoring would make this easier to implement, e.g. if the output was an object (class instance) which could be passed to such transformation functions.
Field1 = Wayfare number Field2 = Wayfare name and type Field3 = Locality Field4 = State/Territory Field5 = Postcode
Field1 Field2, Field3 Field4 Field5
23 Smith St North, Fairfield NSW 2345
23 Smith St, North Fairfield NSW 2345
23 Smith St North Fairfield NSW 2345
Wymallee Arthur St Gundagai NSW 2345 Windy Willows Ave Littleville NSW 2345