13.2 CSV Data Set Implementation

Text files with comma separated values are common, as they are a portable way to store data from spreadsheets or database tables. Often such files have the file extension '.csv'. This data set implementation allows sequential access only.

The fields attribute of a CSV data set must be a dictionary where the keys are the field names and the values are the corresponding column numbers (starting with $0$).

Additional attributes (besides the general data set attributes as described above) for a CSV data set are

The following example shows how to initialise a CSV data set and how to access it in read mode. It is assumed that the dataset.py module has been imported using the import dataset command.

# ====================================================================

mydata = dataset.DataSetCSV(name = 'hospital-data',
                     description = 'Hospital data from 1990-2000',
                    access_right = 'read',
                    header_lines = 1,
                       file_name = './data/hospital.csv',
                          fields = {'year':0,
                  fields_default = '',
                    strip_fields = True,
                  missing_values = ['','missing'])

print mydata.num_records  # Print total number for records

first_record = mydata.read_record()  # Returns one record

hundred_records = mydata.read_records(1000,100)  # Read 100 records
ten_records = mydata.read_records(2000,10)  # Read another 10 records

mydata.finalise()  # Close file, finalise access to data set

Note: In its current implementation, a CSV data set can only consist of one underlying CSV text file. The handling of multiple files as one data set will be implemented in a future version of Febrl.