In many countries geographical data is collected by various state and territory agencies. In Australia, for example, each state and territory have their own governmental agency that collect data to be used for land planning, as well as property, infrastructure or resource management. Additionally, national organisations like post and telecommunications, electoral rolls and statistics bureaus collect their own data. All these data sets are collected for specific purposes, have varying content and are stored in different formats.
The need for a nation-wide, standardised and high-quality geocoded data set has been recognised in Australia since 1990 , and after years of planning, collaborations and development the G-NAF was first released in March 2004. Approximately 32 million address records from 13 organisations were used in a five-phase cleaning and integration process, resulting in a database consisting of 22 normalised files (or tables). Figure 10.2 shows the simplified data model of the 10 main G-NAF files.
G-NAF is based on a hierarchical model, which stores information about
address sites separately from locations and streets. It is possible to
have multiple geocoded locations for a single address, and vice versa,
and aliases are available at various levels. Three geocode files
contain location (longitude and latitude) information at different
levels of details. If an exact address match can be found, its
location can be retrieved from the
file. If there is only a match on street (but not street number)
STREET_LOCALITY_GEOCODE file will provide an
overall street geocode. Finally, if no street level match can be found
LOCALITY_GEOCODE file contains geocode information for
localities (e.g. towns and suburbs). Both the
also contain information about the extent of the street and locality.
For our project we only used the G-NAF records covering the Australian state of New South Wales (NSW), containing around 4 million address, 60,000 street and 5,000 locality records. Table 10.1 gives an overview of the size and content of the 10 main G-NAF data files used.