Word spilling happens when data is entered into input fields with
fixed length and continuous typing automatically continues in the next
field if a field is full. For example, if a given name field with
maximal length of 10 characters is given, and a surname field with 20
characters, the name 'maria louisa miller'
would be stored as
given name 'maria loui'
and surname 'sa miller'
. To
check for word spilling can be a successful data cleaning step if a
data set contains such data.
Word spilling concatenates words at the end and beginning of fields and then checks if such a concatenated word is known, i.e. if it is listed in one of the available look-up tables. If so, the concatenated word is kept, otherwise (i.e. if the word is not known) a whitespace character is inserted between the two original words.