The process-gnaf.py program works by building in-memory hash
table data structures for all the fields (or attributes) in the G-NAF
data files, and it therefore needs a large amount of main memory and
takes some processing times. For example, processing the New South
Wales part of G-NAF (containing around 4 million address site records)
on a SUN Enterprise 450 shared memory (SMP) server with four
480 MHz Ultra-SPARC II processors and 4 Giga Bytes of main
memory used around 3,300 Mega Bytes (3.3 Giga Bytes) of main memory
and took around 34.5 hours (with all processing flags in
process-gnaf.py set to True
). Ways to reduce the
amount of memory needed are to
save_pickle_files
to True
and the
save_shelve_files
to False
, start the
pre-processing, and when it is finished then change the flags,
i.e. set save_pickle_files
to False
and set
save_shelve_files
to True
, and restart the
preprocessing),
process_locality_files
,
process_street_files
, process_address_files
,
create_reverse_lookup_shelve
, and
create_gnaf_address_csv_file
to True
and all
others to False
, and repeat this process until all
pre-processing is done.