The shelve data set uses the Python standard module shelve.py, which provides a file-based hash table (dictionary) that allows efficient storage and access of arbitrary records. Thus, a shelve data set becomes an efficient and convenient data set implementation for temporary persistent storage of records. This data set implementation is for direct random access only.
dbm
, gdbm
or bsddb
. Unfortunately, these
database libraries seem to be badly broken when used within Python
2.2 (i.e. they crash when trying to load several thousand records
into a shelve). Starting with Febrl version 0.2.1 we are
therefor supporting the use of the external module bsddb3
which can be downloaded from http://pybsddb.sourceforge.net.
The Berkeley database library itself is available under an open
source software license from Sleepycat Software at
http://www.sleepycat.com. The shelve data set implementation
automatically detects if the bsddb3 module is installed or
not, and will use it if it is available.
The fields
attribute of a shelve data set must be a dictionary
where the keys are the field names. The corresponding values are not
used and can thus be anything, e.g. an empty string or an integer
number. They are not needed to access records or the fields within
records.
Two additional attributes (besides the general data set attributes as described above) for a shelve data set are
file_name
clear
True
or False
. If set to
True
, the content of the database will be cleared when it
is opened in write
or readwrite
access modes.
Otherwise, the database is kept. In read
access mode this
attribute has no functionality (clearing a database in read only
access would mean to have no records to read).
The following example shows how to initialise a shelve data set and
how to access it in read/write mode. It is assumed that the
dataset.py module has been imported using the import
dataset
command.
# ==================================================================== mydata = dataset.DataSetShelve(name = 'hospital-data', description = 'Hospital data from 1990-2000', access_right = 'readwrite', file_name = 'hospital', clear = True, fields = {'year':'', 'surname':'', 'givenname':'', 'dob':'', 'address':'', 'postcode':'', 'state':''}, fields_default = '', missing_values = ['','missing']) print mydata.num_records # Print total number for records first_record = {'surname':'miller','givenname':'peter','state':'act', '_rec_num_':0} mydata.write_record(first_record) more_records = [{'surname':'smith','givenname':'dave','dob':'1966', '_rec_num_':1}, {'surname':'winkler','givenname':'harry', '_rec_num_':42}, {'surname':'paul','postcode':'2100','state':'nsw', '_rec_num_':0}] mydata.write_records(more_records) print mydata.num_records # Print total number for records (3) record = mydata.read_record(42) record_list = mydata.read_records([0,1]) mydata.re_initialise() # Re-initialise data set in readwrite mode record = mydata.read_record(1) mydata.write_record(first_record)
mydata.re_initialise('read') # Re-initialise data set in read mode record_list2 = mydata.read_records([42,0,1]) mydata.finalise() # Close file, finalise access to data set
Note that record numbers (values in the hidden field
_rec_num_
) do not necessarily need to be in a consecutive
range, as shown in the example above. Also, records can be overwritten
at any time if a new record with an already existing record number
value is written to the data set.