The third type of look-up table files are lists of words with
corresponding frequency counts. These files contain two columns
separated by a comma, thus they are simple CSV (comma separated
values) files, as for example created by a spreadsheet. The first
column contains words and the second column contains the corresponding
frequency counts (positive integer numbers). These files should have a
file extension '.csv'
. The following example is extracted from
a surname frequency look-up table.
# ==================================================================== dijkstra,3 miller,4325 smith,22540
A probability distribution for a given frequency look-up table is computed internally after loading such a file by summing up all the frequency counts and then dividing each frequency count by this sum.
It is possible to load more than one frequency look-up table files into one combined frequency look-up table, by simply giving a list of file names when the table is loaded, as shown in the example below. If an entry is listed in more than one file, its frequency counts are simply added up.
The default value for the attribute default
is the value
1
, i.e. if a value is searched in a table that does not exist,
the default value 1
is returned. The default value can be
changed when a frequency look-up table is initialised using the
default
argument as shown in the example below.
After a frequency look-up table has been loaded from one or more
files, the total sum of all frequency counts is stored in the
attribute sum
.
Assuming the lookup.py module has been imported using the
import lookup
command, an example frequency look-up table can
be initialised and loaded from several files as shown in the following
example.
# ==================================================================== name_freq_table = lookup.FrequencyLookupTable(name = 'NameFreqTable') name_freq_table.load(['surname_english.csv','surname_french.csv']) print name_freq_table.sum print name_freq_table.length print name_freq_table['miller'] # Returns for example 246 print name_freq_table['leroc'] # Returns for example 42 print name_freq_table['deutschmann'] # Should return default value 1 # assuming it's not in the table