The shelve data set uses the Python standard module shelve.py, which provides a file-based hash table (dictionary) that allows efficient storage and access of arbitrary records. Thus, a shelve data set becomes an efficient and convenient data set implementation for temporary persistent storage of records. This data set implementation is for direct random access only.
dbm
, gdbm
or bsddb
. In Python version 2.2
these were badly broken (i.e. they crash when trying to load
several thousand records or more into a shelve). With Python 2.3
and 2.4 this should now work,
The fields
attribute of a shelve data set must be a dictionary
where the keys are the field names. The corresponding values are not
used and can thus be anything, e.g. an empty string or an integer
number. They are not needed to access records or the fields within
records.
Two additional attributes (besides the general data set attributes as described above) for a shelve data set are
file_name
clear
True
or False
. If set to
True
, the content of the database will be cleared when it
is opened in write
or readwrite
access modes.
Otherwise, the database is kept. In read
access mode this
attribute has no functionality (clearing a database in read only
access would mean to have no records to read).
The following example shows how to initialise a shelve data set and
how to access it in read/write mode. It is assumed that the
dataset.py module has been imported using the import
dataset
command.
# ==================================================================== mydata = dataset.DataSetShelve(name = 'hospital-data', description = 'Hospital data from 1990-2000', access_right = 'readwrite', file_name = 'hospital', clear = True, fields = {'year':'', 'surname':'', 'givenname':'', 'dob':'', 'address':'', 'postcode':'', 'state':''}, fields_default = '', missing_values = ['','missing']) print mydata.num_records # Print total number for records first_record = {'surname':'miller','givenname':'peter','state':'act', '_rec_num_':0} mydata.write_record(first_record) more_records = [{'surname':'smith','givenname':'dave','dob':'1966', '_rec_num_':1}, {'surname':'winkler','givenname':'harry', '_rec_num_':42}, {'surname':'paul','postcode':'2100','state':'nsw', '_rec_num_':0}] mydata.write_records(more_records) print mydata.num_records # Print total number for records (3) record = mydata.read_record(42) record_list = mydata.read_records([0,1]) mydata.re_initialise() # Re-initialise data set in readwrite mode record = mydata.read_record(1) mydata.write_record(first_record)
mydata.re_initialise('read') # Re-initialise data set in read mode record_list2 = mydata.read_records([42,0,1]) mydata.finalise() # Close file, finalise access to data set
Note that record numbers (values in the hidden field
_rec_num_
) do not necessarily need to be in a consecutive
range, as shown in the example above. Also, records can be overwritten
at any time if a new record with an already existing record number
value is written to the data set.