'This should never happen!'
in which case a record is not
standardised properly.
II
(initial) tag used in name standardisation. This
will allow HMMs to more easily classify single-letter words
correctly, and then allow their transformation at the output
stage. For example, it is hazardous to transform every instance
of the string ' r '
into ' road '
during the
initial cleaning stage, but it is quite safe to transform any
instance of 'r'
into 'road'
in the output stage
where those instances have been classified as the wayfare type.
unit_number
output field
starts with 'unit'
(as in, for example, 'unit42'
,
then remove the 'unit'
prefix from the
unit_number
field and make the unit_type
field
equal to 'unit'
. There is no need to invent a new rule
specification language to implement this - Python is already
easy enough for users of Febrl to write their own
transformation rules. All the user needs to do is specify a
function name for the transformation of the output
field. However, some more object-oriented refactoring would make
this easier to implement, e.g. if the output was an object
(class instance) which could be passed to such transformation
functions.
Field1 = Wayfare number Field2 = Wayfare name and type Field3 = Locality Field4 = State/Territory Field5 = Postcode
Field1 Field2, Field3 Field4 Field5
23 Smith St North, Fairfield NSW 2345
23 Smith St, North Fairfield NSW 2345
23 Smith St North Fairfield NSW 2345
Wymallee Arthur St Gundagai NSW 2345 Windy Willows Ave Littleville NSW 2345