6.10 Starting a Standardisation Process

Within a project object one method (routine) is available to define and start a standardisation process. Assuming that a project object has been created (by copying and modifying the template module project-standardise.py) and input and output data sets, as well as component and a record standardisers have been defined, standardisation of a data set can be done by one simple call to the method standardise as shown in the following example.

# ====================================================================

myproject.standardise(input_dataset = hospital_data,
                     output_dataset = clean_hospital_data,
                   rec_standardiser = hospital_standardiser,
                       first_record = 0,
                     number_records = 100000)

In the given example, 100,000 records in a fictitious hospital data set are standardised and written into an output data set (assuming it has been initialised).

The following arguments need to be defined for the standardisation process.