Text files with comma separated values are common, as they are a
portable way to store data from spreadsheets or database tables. Often
such files have the file extension '.csv'. This data set
implementation allows sequential access only.
The fields attribute of a CSV data set must be a dictionary
where the keys are the field names and the values are the
corresponding column numbers (starting with
).
Additional attributes (besides the general data set attributes as described above) for a CSV data set are
file_name
header_lines
write_header
True or False. If set to
True, a header line with the field names is written at
the beginning of the CSV file if the data set is initialised in
write mode or in append mode (if the file is
empty). The default value is False, i.e. no header line
will be written.
write_quote_char
" (double quotes).
The following example shows how to initialise a CSV data set and how
to access it in read mode. It is assumed that the dataset.py
module has been imported using the import dataset command.
# ====================================================================
mydata = dataset.DataSetCSV(name = 'hospital-data',
description = 'Hospital data from 1990-2000',
access_right = 'read',
header_lines = 1,
file_name = './data/hospital.csv',
fields = {'year':0,
'surname':1,
'givenname':2,
'dob':12,
'address':7,
'postcode':8,
'state':9},
fields_default = '',
strip_fields = True,
missing_values = ['','missing'])
print mydata.num_records # Print total number for records
first_record = mydata.read_record() # Returns one record
hundred_records = mydata.read_records(1000,100) # Read 100 records
ten_records = mydata.read_records(2000,10) # Read another 10 records
mydata.finalise() # Close file, finalise access to data set