10.2.2 Additional Data Files

Additional information is used in the Febrl geocoding system during the pre-processing step to verify and correct (if possible) postcode and locality name values, and in the matching engine to enable searching for matches in neighbouring regions (postcodes and suburbs) if no exact match can be found.

Australia Post publishes a look-up table containing postcode and suburb information10.2, which can be used when processing the G-NAF locality files to verify and even correct wrong or missing postcodes and locality names (i.e. suburbs). For example, if a postcode is missing in a record, the Australia Post suburb look-up table can be used to find the official postcode for this suburb, and if this is a unique postcode it can be safely imputed into the record. Similarly, missing locality names can be imputed if they correspond to a unique postcode.

Other look-up tables are used to find neighbouring regions for postcodes and suburbs, i.e. for a given region these tables contain all it's neighbours. Table 10.3 shows some example neighbouring postcode values. Look-up tables of both level 1 (direct neighbours) and level 2 (direct plus indirect neighbours - i.e. neighbours of direct neighbours) are used in the fuzzy geocode matching engine to find matches in addresses where no exact postcode or suburb match can be found. Experience shows that people often record different postcode or suburb values if a neighbouring postcode or suburb has a higher perceived social status (e.g. 'Double Bay' and 'Edgecliff'), or if they live close to the border of such regions.

The neighbouring region look-up tables are created using geographical data extracted from a commercial GIS system, and integrated into the Febrl geocode matching engine. See Section 14.5 for more details about neighbouring region look-up tables.

Table 10.3: Example NSW postcodes with their direct and indirect neighbours.
\begin{tableiii}{l\vert l\vert l}{textrm}{Postcodes}...
...{2898}{ -- (Lord Howe Island)}{ --} \hline


... information10.2