C.1 Input

The required inputs for this routine are a list of words and a list of tags as created in the data cleaning and tagging steps (see Chapter 6 for more details). Additionally, a third parameter first_name_comp is needed that gives the system a hint on which name component (given- or surname) is most likely to be at the beginning of the input word list. For example, after cleaning and tagging, the input name string

`del (van a.k.a. peter) miller, phd'
is converted into the following word and tag lists
[`del', `|', `van', `known_as', `peter', `|', `miller', `,']
[`PR',  `VB',  `PR',  `SP',  `GM',  `VB',  `UN',  `CO']
The first name component argument would be set to 'gname' in this example, because this name starts with a given name followed by the surname. Note that all title words (in this example 'phd') are processed and removed from the input word and tag lists before the rule based name segmentation is started.