13.3 SQL Data Set Implementation

This data set implementation allows access to the popular MySQL open source database system. It uses the Python module MySQLdb which has to be installed separately^13.2.

Note: Support for a wider range of SQL databases will be added to Febrl in a future release. More information on the Python database API can be found at http://www.python.org/topics/database/, including a list of Python modules for various (commercial and open source) databases.

The SQL data set implementation currently allows sequential access only.

The fields attribute of an SQL data set must be a dictionary where the keys are the field names and the values are the names of the corresponding table columns in the SQL database.

If an SQL data set is initialised in write or append mode it is assumed that the table is already created and contains the necessary columns. The main reason for this is when creating an SQL table it is necessary to give the types and length of all table columns, information which are not available when a data set is initialised.

Additional attributes (besides the general data set attributes as described above) for a SQL data set are

table_name
A string containing the name of the underlying SQL table.
database_name
The name of the database that contains the table to be accessed.
database_user
The name of the database user.
database_password
The corresponding database password for the given user. If this password is not set when the data set is initialised, it will be queried interactively when the connection to the database is established.
database_block_size
For efficiency reasons reading records from a database table is done in blocks. The default number of records in a block is set to 10,000, but this value can be adjusted with this attributed.

The following example shows how to initialise a SQL data set and how to access it in read mode. It is assumed that the dataset.py module has been imported using the import dataset command.

# ====================================================================

mydata = dataset.DataSetSQL(name = 'hospital-data',
                     description = 'Hospital data from 1990-2000',
                    access_right = 'read',
                   database_name = 'hospital-data',
                   database_user = 'exampleuser',
                      table_name = 'hospital',
                          fields = {'year':'year',
                                    'surname':'sname',
                                    'givenname':'gname',
                                    'dob':'dateofbirth',
                                    'address':'wayfare',
                                    'postcode':'pcode',
                                    'state':'territory'},
                  fields_default = '',
                    strip_fields = True,
                  missing_values = ['','missing'])

print mydata.num_records  # Print total number for records

first_record = mydata.read_record()  # Returns one record

hundred_records = mydata.read_records(1000,100)  # Read 100 records
ten_records = mydata.read_records(2000,10)  # Read another 10 records

mydata.finalise()  # Close file, finalise access to data set

Note: In its current implementation, an SQL data set can only consist of one underlying database table. The handling of multiple tables as one data set will be implemented in a future version of Febrl.

Footnotes

... separately ^13.2: Available for download from http://sourceforge.net/projects/mysql-python