A tagging look-up table file contains one or more blocks of entries,
with all entries in a block are being assigned the same tag. Tagging
look-up table files should have a file extension '.tbl'. The
format of these files is as follows:
tag=<tag>
'#')
after the assignment.
: values
'#'
character.
# ====================================================================
tag=<SP> # Tag for separator elements
and :
or :
known as : kn as, kn, known
tag=<BO> # Tag for 'baby of' and similar sequences
baby :
baby of :
daughter :
daughter of :
son :
son of :
tag=<NE> # Tag for word 'nee' (born as) or surname or givenname (?)
nee :
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
tag=<TR> # Tag for territory words
other territories : o/t, o t, other territory, other terr
new south wales : n s w, new s w, new south w, nsw, n south w,
n south wales, new south wa, n south wa,
new s wa, n s wales, new s wales
queensland : q l d, q land, queen land, queens land, qld,
queenland
south australia : s a, s australia, s australian, sa,
south australian, southern australia,
southern australian
victoria : vi, vict, vic
western australia : w australia, w australian, wa,
western australian, west australia,
west australian
It is possible to load more than one tagging look-up table file into one combined tagging look-up table, by simply giving a list of file names when the table is loaded, as shown in the example below. If an entry is listed in different files with different tags, and error is triggered.
The default value for the attribute default is an empty string
'', i.e. if a value is looked up in a table that does not
exist, an empty string is returned. The default value can be changed
when a tagging look-up table is initialised using the default
argument as shown in the example below.
After one or more tagging files have been loaded into a tagging
look-up table, the attribute max_key_length is set to the
maximal length in words of all keys in the look-up table. If for
example the longest key in a look-up table is 'south west
rocks' then the value of max_key_length would be 3.
Assuming the lookup.py module has been imported using the
import lookup command, an example tagging look-up table can be
initialised and loaded from several files as shown in the following
example. It is also assumed that the febrl.py module has
been imported so the directory separator character 'dirsep'
is available (as used in the example below).
# ====================================================================
name_tagging_table = lookup.TagLookupTable(name = 'NameTagTable',
default = 'missing')
name_tagging_table.load(['data'+dirsep+'givenname_f.tbl',
'data'+dirsep+'givenname_m.tbl',
'data'+dirsep+'name_prefix.tbl',
'data'+dirsep+'name_misc.tbl',
'data'+dirsep+'saints.tbl',
'data'+dirsep+'surname.tbl',
'data'+dirsep+'title.tbl'])
print name_tagging_table.length
print name_tagging_table.max_key_length
print name_tagging_table[('peter',)] # Prints: ('peter', 'GM')
print name_tagging_table['xyg0542w'] # Assume not in table, 'missing'
# will be returned