A tagging look-up table file contains one or more blocks of entries,
with all entries in a block are being assigned the same tag. Tagging
look-up table files should have a file extension '.tbl'
. The
format of these files is as follows:
tag=<
tag>
'#'
)
after the assignment.
:
values
'#'
character.
# ==================================================================== tag=<SP> # Tag for separator elements and : or : known as : kn as, kn, known tag=<BO> # Tag for 'baby of' and similar sequences baby : baby of : daughter : daughter of : son : son of : tag=<NE> # Tag for word 'nee' (born as) or surname or givenname (?) nee : # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - tag=<TR> # Tag for territory words other territories : o/t, o t, other territory, other terr new south wales : n s w, new s w, new south w, nsw, n south w, n south wales, new south wa, n south wa, new s wa, n s wales, new s wales queensland : q l d, q land, queen land, queens land, qld, queenland south australia : s a, s australia, s australian, sa, south australian, southern australia, southern australian victoria : vi, vict, vic western australia : w australia, w australian, wa, western australian, west australia, west australian
It is possible to load more than one tagging look-up table file into one combined tagging look-up table, by simply giving a list of file names when the table is loaded, as shown in the example below. If an entry is listed in different files with different tags, and error is triggered.
The default value for the attribute default
is an empty string
''
, i.e. if a value is looked up in a table that does not
exist, an empty string is returned. The default value can be changed
when a tagging look-up table is initialised using the default
argument as shown in the example below.
After one or more tagging files have been loaded into a tagging
look-up table, the attribute max_key_length
is set to the
maximal length in words of all keys in the look-up table. If for
example the longest key in a look-up table is 'south west
rocks'
then the value of max_key_length
would be 3.
Assuming the lookup.py module has been imported using the
import lookup
command, an example tagging look-up table can be
initialised and loaded from several files as shown in the following
example. It is also assumed that the febrl.py module has
been imported so the directory separator character 'dirsep'
is available (as used in the example below).
# ==================================================================== name_tagging_table = lookup.TagLookupTable(name = 'NameTagTable', default = 'missing') name_tagging_table.load(['data'+dirsep+'givenname_f.tbl', 'data'+dirsep+'givenname_m.tbl', 'data'+dirsep+'name_prefix.tbl', 'data'+dirsep+'name_misc.tbl', 'data'+dirsep+'saints.tbl', 'data'+dirsep+'surname.tbl', 'data'+dirsep+'title.tbl']) print name_tagging_table.length print name_tagging_table.max_key_length print name_tagging_table[('peter',)] # Prints: ('peter', 'GM') print name_tagging_table['xyg0542w'] # Assume not in table, 'missing' # will be returned