– By Chief Technology Officer, Andrew Tait, Decision Mechanics Limited

Science News recently published a great article on the use and abuse of statistics.

*Odds Are, It’s Wrong*, by Tom Siegfried, highlights some of the problems associated with testing hypotheses using statistical methods. These problems are well known within the statistics community, with “hundreds” of papers having been written on the subject. As Siegfried pithily observes, “if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature”.

Statistical analysis is often presented in support of decisions. It’s rarely challenged. Yes, it’s sometimes *ignored* if it contradicts the views of key stakeholders. But it’s rarely *challenged*. This, we would suggest, is a consequence of:

- statistical analyses being considered to be objective; and
- decision-makers not being confident enough to dig around in the details.

Combine the reluctance to challenge statistical analyses with their widespread abuse and you clearly have cause for concern.

The article points out that, even among scientists, statistical literacy leaves much to be desired. For example:

- Statisical significance (e.g. p < 0.05) is often presented as a “black or white” binary concept. However, the use of 0.05 (?or 0.01?or 0.001) is completely arbitrary. In fact, with a p value of 0.05, there’s a 1 in 20 chance that the observed result is a fluke. Is that acceptable given the decision you have to make?
- Statistical significance at the 0.05 level is commonly equated to 95% certainty that the result could not have occurred by chance. This isn’t the case. You can’t draw conclusions about the likelihood of the hypothesis being correct based on its statistical significance. The correct interpretation, given a p value of 0.05, is that there is only a 5% chance of getting the observed result if no real effect is present.
- Studies also tend to equate
*statistical*significance with*practical*significance. An example given in the article is that an expensive new drug may be*statistically*significantly better than an old one, but, if it only provides one new cure for every 1000 patients, that’s not of much practical use.

A final interesting observation made by Siegfried concerns random trials. Selecting groups at random provides no guarantee that they exhibit random traits with respect to the phenomenon of interest. Let’s say we walk out onto the street right now and select two groups of five professional strangers. How likely is it that both groups would express similar political views, for example? When discussing (as Siegfried does) drug trials, there are countless dimensions on which patients can vary.

Humorist Evan Esar was clearly onto something when he defined statistics as “the science of producing unreliable facts from reliable figures”?