Text files with comma separated values are common, as they are a
portable way to store data from spreadsheets or database tables. Often
such files have the file extension '.csv'
. This data set
implementation allows sequential access only.
The fields
attribute of a CSV data set must be a dictionary
where the keys are the field names and the values are the
corresponding column numbers (starting with ).
Additional attributes (besides the general data set attributes as described above) for a CSV data set are
file_name
header_lines
write_header
True
or False
. If set to
True
, a header line with the field names is written at
the beginning of the CSV file if the data set is initialised in
write
mode or in append
mode (if the file is
empty). The default value is False
, i.e. no header line
will be written.
write_quote_char
"
(double quotes).
The following example shows how to initialise a CSV data set and how
to access it in read mode. It is assumed that the dataset.py
module has been imported using the import dataset
command.
# ==================================================================== mydata = dataset.DataSetCSV(name = 'hospital-data', description = 'Hospital data from 1990-2000', access_right = 'read', header_lines = 1, file_name = './data/hospital.csv', fields = {'year':0, 'surname':1, 'givenname':2, 'dob':12, 'address':7, 'postcode':8, 'state':9}, fields_default = '', strip_fields = True, missing_values = ['','missing']) print mydata.num_records # Print total number for records first_record = mydata.read_record() # Returns one record hundred_records = mydata.read_records(1000,100) # Read 100 records ten_records = mydata.read_records(2000,10) # Read another 10 records mydata.finalise() # Close file, finalise access to data set