The statistical evidence in the postmasters: Bates (Horizon Issues) trial

by Jeremy Dawson

The judgment in the Bates (Horizon Issues) trial is at https://www.bailii.org/ew/cases/EWHC/QB/2019/3408.html.
Dr Worden's first report (07 December 2018) to which I refer is at https://www.postofficetrial.com/2019/06/horizon-trial-post-office-independent.html.
For Nick Wallis's book, The Great Post Office Scandal, see https://bathpublishing.com/products/the-great-post-office-scandal-first.

Dr Robert Worden's evidence

In the Horizon Issues trial, statistical analyses were provided by the Post Office's expert witness, Dr Robert Worden. The judge, at para 805 of the judgment, quotes section 8.8.1 of Dr Worden's report (paras 759 to 766), as follows:

760. Because Post Office has had an average of 13,560 branches over the lifetime of Horizon, the total number of monthly branch accounts has been about 3 million.
761. Therefore, if a bug like the Suspense Account bug has occurred 16 times in the lifetime of Horizon, the chance of it having occurred in any given branch in any given month is about 16 in 3 million. [omitted text refines this calculation]
762. I have considered a bug similar to the suspense account bug, which occurred about 10 times, and had a mean financial impact of about £1000 per occurrence. How many similar bugs would be needed, to give a one in ten chance of one such bug occurring, with an impact of £1000, on a particular Claimant's branch in a particular month?
763. The answer, given by elementary arithmetic which I describe in section 8.5, is that there would need to be 50,000 of these distinct bugs.
764. So the Claimants cannot credibly assert that their shortfalls were caused by bugs in Horizon, unless there were something of the order of 50,000 such bugs.

This is complete nonsense. It like arguing in a murder trial that because the homicide rate in the UK is only 1 per 100,000 per year, it is overwhelmingly unlikely that the defendant is guilty.

This is so obviously nonsense that it may be superfluous to ask what is the flaw in the argument, but here it is. You can use his calcuation to get the probability that a randomly chosen person committed the murder in question. But the person on trial is not a person chosen randomly.

Likewise you could calculate the probability that a randomly chosen subpostmaster has been affected by a Horizon bug, or at least make a conscientious attempt to do so, as Dr Worden has done. But the subpostmasters who claim to have suffered unexplained shortfalls, possibly Horizon related, are not a randomly chosen subset.

I'm told this statistical fallacy is surprisingly common. I would hope it is not common among expert witnesses. I'm sure it's not common among competent expert witnesses.

At this point there is not much more that needs to be said. But as I have spent so long reading the expert's report, the transcripts and the judgment, I'm going to say some more anyway. Skip it if you like, but do jump ahead to my heading "The Suspense Account" - there are some different angles I discuss there.

Fortunately the judge gets this exactly right. I suspect not all judges would. At para 766 he says

I deal with that point in further detail below at [821] and [822] below, but this amounts to an assumption by Dr Worden that a group of SPMs who specifically allege they have experienced the effects of bugs are to be treated, in statistical terms, as though they are a random group of SPMs of the same sample size drawn from the wider population of all SPMs. They plainly are not a randomly drawn sample of nearly 600 SPMs. They are a very specific group (or sample) of those who say their branch accounts have been impacted by, or have experienced, such incidents. In statistical terms, the correct term for the group is that they have a bias - they all allege that they have experienced the effects of bugs, errors and defects.

After recounting Dr Worden's statistical evidence (para 805) he goes into a long digression, and then (para 821) comes to the key issue:

However, the claimants are not a random sample of SPMs ... As a sample, they have already been filtered or selected in that these particular SPMs already complain of bugs, defects and errors in Horizon having affected their branch accounts. This means that they are not a random sample. The way this would be expressed in statistical terms is that the claimant SPMs do not accurately represent the population of SPMs as a whole (...). The claimants are essentially self-selected, from those who believe they have experienced shortfalls and discrepancies in their accounts from the impact of bugs, errors and defects ... The group has a bias, in statistical terms. They plainly cannot be treated, in statistical terms, as though they are a random group of 587 SPMs.

Exactly. But I'd add that this is such an elementary blunder that this guy should never be let near a witness-box again.

Dr Worden's report provides some other analyses, which essentially make the same error in different ways. Assuming that bugs in Horizon are equally spread across Post Office branches and over time, and adjusted for how busy branches were, the effect of bugs on all branches would have been 160 times the effect of bugs on the claimants.

In section 8.7.9 he analyses known problems with Horizon and produces an estimate of the total effect of all bugs on Post Office branches. I assume for this discussion that it is possible to make a plausible estimate and that he has done so. Then his assumption that the effect is equally spread over branches leads him to estimate the total effect of bugs on the claimants' branches. This assumption is obviously unsound. Branches may well be equally likely to be affected, based on their characteristics such as size, location, etc, so if the claimants were a randomly chosen group of postmasters then his analysis would be fine. But the set of claimants consists of people who say they have been affected - they are not a randomly chosen subset.

In section 8.8.2 he does essentially the same thing, here discussing how if bugs affected all branches equally to the claimed effect on the claimants' branches then the Call Centre (which took calls about problems with Horizon) would have been inundated. His reasoning here has the same flaw.

In cross-examination, this issue was discussed several times.

On day 18, see the transcript at https://www.postofficetrial.com/2019/06/horizon-trial-day-18-transcript.html about 80% down, or search for "Penny Black". They agreed to assume that one person in 500,000 in the UK is a lady called Penny Black, and discussed the probability of finding one or more such at a party of 50 people. They then discussed the scenario of a party to which only people called Penny Black had been invited. Dr Worden said

"I should say generally that probability theory is what one uses in the absence of specific knowledge like you have just put to me, and that specific knowledge changes the whole ball game."

However Dr Worden did not accept that this is relevant to his analyses about Horizon essentially because he should not assume that the claimants are correct in their stated belief that they have been adversely affected by bugs in Horizon.

This doesn't change the fact that the claimants are not a random set of postmasters, but a self-selected sample.

But here is an analogy: since I mention coin-tossing later, I'll do so here. Imagine a number of coins have been tossed and lying on the ground. Imagine then that someone, even somebody of doubtful honesty and worse eyesight claims to have separated out those showing a "head" (leaving them showing the same side as they fell).

So, for a coin among those which he has selected, what is the probability that it shows a head? You may not accept it as being 100% but you're damn sure that you shouldn't treat it as 50%!

Unless Dr Worden is saying that his opinion is based on the assumption that the claimants' evidence is so unreliable as to be quite worthless. Now here is a legal, not a statistical, point, and one not noted in the judgment, so I may be wrong: an expert opinion, when based on a particular view of the primary facts, should say so, and if the court comes to a different view of those primary facts, then the expert opinion becomes irrelevant. [UPDATE: actually, pretty much this point is made in the judgment, paras 831-2]

In any case, Dr Worden considers that he should disregard the claimants' evidence, and uses this assumption to construct a statistical argument denying the validity of the claimants' evidence, so that is a circular argument. [UPDATE: the judgment makes the same point about the circularity of Dr Worden's argument that Horizon has had a "good in-service record over 18 years", see para 819.9]

On day 19, see the transcript at https://www.postofficetrial.com/2019/06/horizon-trial-day-19-transcript_32.html about 40% down, or search for "tweeting". The barrister introduced a scenario similar to the Penny Black party, and then said

"I'm going to put a point to you that I'd be happy to put to my 13-year-old daughter, which is that when you look at a statistical sample the first thing you should do is look at the nature of the sample and how they were selected?"

The ensuing discussion led to Dr Worden saying

"the claimants are a self-selected sample and they selected themselves long after they suffered their shortfalls. So the point you are putting to me effectively is these people selected themselves and that somehow caused Horizon several years previously to rain bugs on them. And so the causation is completely the wrong way round between Horizon affecting the claimants and the claimants self-selecting. It doesn't make sense."

and later

"it [the fact that a postmaster believes that he/she has suffered in the way which is the subject of the proceedings] is not a material factor in whether Horizon during your tenure caused bugs to you."

After a bit more on this theme, perhaps the judge sees the issue clearly, he says "I think [this sequence of cross-examination] has probably gone on long enough."

So, is the direction of causation an issue? In a word, no. As in my murder trial analogy: the fact that the police and prosecutors have come to suspect a particular individual doesn't cause him to commit a murder some time previously.

Or another coin-tossing example. Suppose two coins are tossed, and you are interested in the probability that both show a "head".

A preliminary point here, on our intuitions about the notion of probability. If a coin is to be tossed in the future, then to say that there is a 50% probability that it will show a head has one meaning - most easily expressed that if you were to do it repeatedly, then half of the trials would show a head.

If a coin has already been tossed, then its probability of being a head is either 100% or 0%, you just don't know which. To say that the probability of it being a head is 50% is a description not of the facts, but of your estimate of the facts. And then, after looking at the coin, your assessment of the facts will change, you will now say that the probability of it being a head is 100%, or it is 0%, as the case may be. (And of course your looking at the coin doesn't cause it to be a head or not).

So now consider two coins tossed. On your knowledge at this point, the probability of both being heads is 25% (50% squared). If someone looks at the first coin and tells you that it is a head, then the probability of both being heads is now 50%. (In the theory of probability, these are the prior and posterior probabilities of Bayes' Theorem. Using Dr Worden's words, this new knowledge changes the whole ball game, but probability theory is nonetheless still relevant.) The probability changed, with no causation involved. Now, a second scenario, you are told that in fact the first coin tossed is a double-headed coin. Again the probability of both being heads is now 50%. Here there is causation involved, the fact of one coin being double-headed causes it to be more likely that both show heads.

But the numbers are the same in each scenario, and for the same reason. Whether causation is involved or not is irrelevant.

The Suspense Account

So what of the judge's "long digression" (paras 810-820), before he gets to the nub of why Dr Worden's approach is nonsense? Well, it can be put into the context of my murder trial analogy, thus:

(a) you should take into account that the homicide rate varies between male and female, young and old, and adapt your numbers to the age and sex of the accused (etc)

(b) the homicide rate may be (a lot) higher than you are actually aware of

Both points are correct, but tinker around the edges of the issue: neither point changes the fact that Dr Worden's approach is quite unsound. Which is why I call the passage a long digression.

But it is a really interesting digression. Because on point (b), the issue is that the Post Office ran a suspense account. This consisted of all the bits of money the Post Office had, but didn't know why it had them (or, one must infer, whether it should have them).

This really made me think WTF??? The Post Office runs an accounting system which can't tell where all their money has come from. So why does it think that its accounting system is good enough to tell it that missing amounts of money must be the fault of the postmasters?

Maybe I'm naive: maybe this is normal in such large organizations. Page 208 of Nick Wallis's book suggests that it would be a "miracle of finance" not to require such a suspense account. But I stand by saying that if their accounting system can't tell why they have the money they have, then it can't possibly be adequate to tell them why they are missing the money they are missing.

And I won't deny that the Post Office's accounting is probably better than my own. For example, I often find myself wondering where all the money I took out of an ATM a week ago has gone. But I don't go making accusations of theft against the visitors to my home during that week!

There is a further point here which I myself didn't pick up until reading Paul Marshall's submission to the Williams Inquiry: how can they be sure that none of the amounts in the suspense account are actually the very same amounts that are missing from the subpostmasters' accounts? If these could be the same amounts, then the Post Office was prosecuting subpostmasters for missing money which was actually in the Post Office's hands.

This is alluded to in para 810 of the judgment (quoting the claimants)

"38. The Defendant operated one or more suspense accounts in which it held unattributed surpluses including those generated from branch accounts. After a period of 3 years, such unattributed surpluses were credited to the Defendant's profits and reflected in its profit and loss accounts.
39. The Defendant thereby stood to benefit and/or did benefit from apparent shortfalls wrongly attributed to the Claimants which did not represent real losses to the Defendant."

and in Nick Wallis's book at page 381:

'The Post Office has improperly enriched itself through the decades,' he [Second Sight's Ron Warmington] thundered, 'with funds that have passed through its own suspense accounts. Had its own staff more diligently investigated in order to establish who were the rightful owners of those funds, they would have been returned to them, whether they were Post Office's customers or its Subpostmasters. ...'

This is also mentioned in a submission by Paul Marshall to the Williams Inquiry, see link to Paul-Marshall in https://www.postofficehorizoninquiry.org.uk/key-documents/written-submissions-november-2021 and see pg 6 item c.

Second Sight identified the existence of unattributed/unallocated funds/receipts in Post Office suspense accounts. This raises the important, indeed troubling, question as to whether the Post Office had in fact received monies for which it variously prosecuted, or pursued civil claim against, postmasters. That is an issue/question that to my knowledge remains unresolved. See further Second Sight Final Report April 2015 [at https://www.jfsa.org.uk/uploads/5/4/3/1/54312921/report_9th_april_2016.pdf (sic)] paragraphs [2.15], [2.16].

Coincidentally, an illustration from my own experience

I read Nick's book over four days. By quite a striking coincidence, during those very four days, I received a cheque for over £18000, paid to me in error. (It was for the redemption of a share fund investment - but the same amount had also been deposited into my bank account).

I am not making this up! Even though Dr Worden's arguments would conclude that I am, as follows (paragraph numbers are references to the analogous paragraphs in his first report):

if the financial institution paid me double then it would most likely have paid everyone double, on average (see para 784.2)
if the financial institution paid out everyone double, then it would fairly quickly notice the situation (see paras 785, 787-791) (this point I can accept)
(therefore, we infer) it didn't happen, to me or to anyone else

or, putting his argument another way

after a lot of effort making an educated guess, say that the institution's accounting systems suffer a glitch like this at a rate of one per 10,000 customers (or some other plausible number) and the amount of money involved (averaged over all customers) is (according to the average account size) say £3 (see paras 746-748)
this is a tiny fraction of £18000, the amount of error I claim to have seen (see para 751)
(therefore, we infer) it is most unlikely that this has happened to me and so I must be making this up

The bottom line of all this is obviously that it is possible for a system to make errors occasionally, and not to make them all the time.

So how do we evaluate the famous statement by Lord Hoffmann "It is notorious that one needs no expertise in electronics to be able to know whether a computer is working properly." (DPP v. McKeown and Jones [1997] 1 WLR 295, 201 C-D, https://publications.parliament.uk/pa/ld199697/ldjudgmt/jd970220/mcke02.htm ) in the context of this incident?

Apart from the fact that this is "expert" evidence given from the bench, and so not subject to cross-examination, by a person unqualified to give such evidence, it's just plain wrong.

Mostly the institution's systems work fine. This will be the experience of experts in electronics and non-experts alike. Sometimes (rarely), as on this occasion, they don't. Almost all experts and non-experts alike will be unaware of that. The tiny fraction of people affected, experts and non-experts alike will be aware of it (at least when the amount involved is £18000).

Possibly the error was triggered by unusual or idiosyncratic human input. Or possibly human error, not caught by the computer-based accounting system. Who knows? And, in the context of prosecuting people on account of such errors, so what?