13.4 Shelve Data Set Implementation

The shelve data set uses the Python standard module shelve.py, which provides a file-based hash table (dictionary) that allows efficient storage and access of arbitrary records. Thus, a shelve data set becomes an efficient and convenient data set implementation for temporary persistent storage of records. This data set implementation is for direct random access only.

Note: The Python shelve module is based on a database like dbm, gdbm or bsddb. In Python version 2.2 these were badly broken (i.e. they crash when trying to load several thousand records or more into a shelve). With Python 2.3 and 2.4 this should now work,

The fields attribute of a shelve data set must be a dictionary where the keys are the field names. The corresponding values are not used and can thus be anything, e.g. an empty string or an integer number. They are not needed to access records or the fields within records.

Two additional attributes (besides the general data set attributes as described above) for a shelve data set are

The following example shows how to initialise a shelve data set and how to access it in read/write mode. It is assumed that the dataset.py module has been imported using the

import
dataset

command.

# ====================================================================

mydata = dataset.DataSetShelve(name = 'hospital-data',
                        description = 'Hospital data from 1990-2000',
                       access_right = 'readwrite',
                          file_name = 'hospital',
                              clear = True,
                             fields = {'year':'',
                                       'surname':'',
                                       'givenname':'',
                                       'dob':'',
                                       'address':'',
                                       'postcode':'',
                                       'state':''},
                     fields_default = '',
                     missing_values = ['','missing'])

print mydata.num_records  # Print total number for records

first_record = {'surname':'miller','givenname':'peter','state':'act',
                '_rec_num_':0}

mydata.write_record(first_record)

more_records = [{'surname':'smith','givenname':'dave','dob':'1966',
                 '_rec_num_':1},
                {'surname':'winkler','givenname':'harry',
                 '_rec_num_':42},
                {'surname':'paul','postcode':'2100','state':'nsw',
                 '_rec_num_':0}]

mydata.write_records(more_records)

print mydata.num_records  # Print total number for records (3)

record = mydata.read_record(42)

record_list = mydata.read_records([0,1])

mydata.re_initialise()  # Re-initialise data set in readwrite mode

record = mydata.read_record(1)

mydata.write_record(first_record)

mydata.re_initialise('read')  # Re-initialise data set in read mode

record_list2 = mydata.read_records([42,0,1])

mydata.finalise()  # Close file, finalise access to data set

Note that record numbers (values in the hidden field _rec_num_) do not necessarily need to be in a consecutive range, as shown in the example above. Also, records can be overwritten at any time if a new record with an already existing record number value is written to the data set.