Text files with fields with fixed column width are commonly
used. Fields are normally specified with the start column and the
field width, or with a start and an end column. Febrl uses
the start column (starting with zero) and field width (or length)
format to define fields. The file extension of such files is often
'.txt'
. This data set implementation allows sequential access
only.
The fields
attribute of a COL data set must be a dictionary
where the keys are the field names and the values are tuples with
start column (starting from ) and field width.
Additional attributes (besides the general data set attributes as described above) for a COL data set are
file_name
header_lines
write_header
True
or False
. If set to
True
, a header line with the field names is written at
the beginning of the COL file if the data set is initialised in
write
mode or in append
mode (if the file is
empty). The default value is False
, i.e. no header line
will be written. Note that field names will be truncated or
expanded with white spaces in order to fit into the field
formats (number of columns defined for the fields).
fields = {'hospitalcode':(10,4), 'year':(14,8), 'yearhospcode':(10,12), 'name':(30,20), 'address':(60,50)}
However, for COL data sets initialised in write
or
append
mode, the field definitions must not be overlapping
nor must there be gaps between field definitions.
The following example shows how to initialise a COL data set and how
to access it in read mode. It is assumed that the dataset.py
module has been imported using the import dataset
command.
# ==================================================================== mydata = dataset.DataSetCOL(name = 'hospital-data', description = 'Hospital data from 1990-2000', access_right = 'read', header_lines = 1, file_name = './data/hospital.txt', fields = {'year':(0,4), 'surname':(4,10), 'givenname':(14,10), 'dob':(24,8), 'address':(32,30), 'postcode':(62,4), 'state':(66,3)}, fields_default = '', strip_fields = True, missing_values = ['','missing']) print mydata.num_records # Print total number for records first_record = mydata.read_record() # Returns one record hundred_records = mydata.read_records(0,100) # Read 100 records ten_records = mydata.read_records(2000,10) # Read another 10 records mydata.finalise() # Close file, finalise access to data set