Cleaning and standardisation of phone numbers is done by a regular expression and rules based approach. Phone numbers can consist of a country code (possibly with an IDD - international direct dial - prefix), followed by an area code, then the actual number (with it's number of digits depending upon the country and sometimes even the area), and possibly an extension. The routines related to phone number standardisation are implemented in the phonenum.py module.
Assuming the input phone number is given in one string, the Febrl phone number standardiser parses this number into the five output fields shown in Table 6.3. The phone cleaning and parsing method has a list of all international country codes built in, as well as two routines to specifically parse Australian or Canadian/US phone numbers.
The following arguments need to be set when a phone number standardiser is initialised.
name
description
input_fields
output_fields
None
if no output is to be written to a field (for
example if one is not interested in the country code and name),
as long as at least one output field is defined (i.e. not
None
).
default_country
'australia'
or 'canada/usa'
. The default value is 'australia'
.
The following example code shows how a phone number standardiser is initialised.
# ==================================================================== phone_std = PhoneNumStandardiser(name = 'Phone-Num-std', description = 'Phone number standardiser', input_fields = 'phone_num', output_fields = ['phone_country_code', 'phone_country_name', 'phone_area_code', 'phone_number', 'phone_extension'], default_country = 'australia')