Within a project
object one method (routine) is available to
define and start a standardisation process. Assuming that a
project
object has been created (by copying and modifying the
template module project-standardise.py) and input and output
data sets, as well as component and a record standardisers have been
defined, standardisation of a data set can be done by one simple call
to the method standardise
as shown in the following example.
# ==================================================================== myproject.standardise(input_dataset = hospital_data, output_dataset = clean_hospital_data, rec_standardiser = hospital_standardiser, first_record = 0, number_records = 100000)
In the given example, 100,000 records in a fictitious hospital data set are standardised and written into an output data set (assuming it has been initialised).
The following arguments need to be defined for the standardisation process.
input_dataset
read
access mode.
output_dataset
write
, readwrite
or append
access mode.
This output data set can be any data set implementation except a
memory based data set (as all standardised records would be lost
once the program finishes). See Chapter 13
for more information on data set implementations.
rec_standardiser
first_record
None
(default), the first record (i.e. record with
number 0
) is taken.
number_records