The required inputs for this routine are a list of words and a list of
tags as created in the data cleaning and tagging steps (see
Chapter 6 for more details). Additionally,
a third parameter first_name_comp
is needed that gives the
system a hint on which name component (given- or surname) is most
likely to be at the beginning of the input word list. For example,
after cleaning and tagging, the input name string
`del (van a.k.a. peter) miller, phd'
[`del', `|', `van', `known_as', `peter', `|', `miller', `,']
[`PR', `VB', `PR', `SP', `GM', `VB', `UN', `CO']
'gname'
in
this example, because this name starts with a given name followed by
the surname. Note that all title words (in this example 'phd'
)
are processed and removed from the input word and tag lists before the
rule based name segmentation is started.