Within a project object one method (routine) is available to
define and start a standardisation process. Assuming that a
project object has been created (by copying and modifying the
template module project-standardise.py) and input and output
data sets, as well as component and a record standardisers have been
defined, standardisation of a data set can be done by one simple call
to the method standardise as shown in the following example.
# ====================================================================
myproject.standardise(input_dataset = hospital_data,
output_dataset = clean_hospital_data,
rec_standardiser = hospital_standardiser,
first_record = 0,
number_records = 100000)
In the given example, 100,000 records in a fictitious hospital data set are standardised and written into an output data set (assuming it has been initialised).
The following arguments need to be defined for the standardisation process.
input_dataset
read access mode.
output_dataset
write, readwrite or append access mode.
This output data set can be any data set implementation except a
memory based data set (as all standardised records would be lost
once the program finishes). See Chapter 13
for more information on data set implementations.
rec_standardiser
first_record
None (default), the first record (i.e. record with
number 0) is taken.
number_records