Understanding some simple statistics in news stories about medical treatments
Every day we get bombarded in the news with health statistics. Coffee causes cancer! Coffee cures cancer! And so on. Many of these are meant to grab headlines (and, these days, web page clicks) and the articles they accompany are often very poor at telling the reader what they mean. They often have statistics, and health statistics can be complicated. Sad to say, even many physicians are pretty poor and sorting out the hype from the helpful. This article is a very helpful guide to finding your way when you are reading the health news. It’s called “Helping Doctors and Patients Make Sense of Health Statistics,” and it does just that. I’d bookmark it or even print it out for future reference, if you’re more old school. Don’t be put off by the somber looking first page — it’s actually quite readable. I should point out here that, although I took statistics courses long ago and have used simple statistical tests in my own research career, I am by no means expert. I always consulted a real statistician before submitting any research for publication.
The article starts with the common problem one sees in media reports: the difference between absolute and relative risk. The authors used the example of a scare over birth control pills that happened in 1995 when the U.K. Committee on Safety in Medicines issued a warning about a newer version of pills. The committee sent a warning to all physicians that the newer pills were associated with a 100% rise in risk for serious blood clots. One hundred percent! Yikes! The warning led to many women stopping their pills and there was a predictable rise in unwanted pregnancies, accompanied by an estimated 15,000 more abortions the following year. The effects lasted for years. What was the truth about this new risk?
The truth was that the newer pills were associated with a risk for serious blood clots of 2 per 7,000 women. For comparison, the earlier pills had been associated with a 1 per 7,000 women risk. And 2 is 100% more than 1, so that’s the increase in relative risk. But the increase in absolute risk was an additional 1 woman in 7,000. (It should be noted here that pregnancy itself is associated with an increased risk of blood clots.) I see this particular misunderstanding often in news reports regarding risks of medical procedures. When you read these stories you need to examine not just the relative risk, which often makes for good headlines; you need to look at the actual number, not just the percent change.
The pill scare hurt women, hurt the National Health Service, and even hurt the pharmaceutical industry. Among the few to profit were the journalists who got the story on the front page.
There is also this helpful article, from the always readable Scientific American. I like it a lot, too. It reminds us how statistical significance doesn’t always mean real life significance.
Imagine if there were a simple single statistical measure everybody could use with any set of data and it would reliably separate true from false. Oh, the things we would know! Unrealistic to expect such wizardry though, huh? Yet, statistical significance is commonly treated as though it is that magic wand. Take a null hypothesis or look for any association between factors in a data set and abracadabra! Get a “p value” over or under 0.05 and you can be 95% certain it’s either a fluke or it isn’t. You can eliminate the play of chance! You can separate the signal from the noise! Except that you can’t. That’s not really what testing for statistical significance does. And therein lies the rub.
The article points out what the vaunted, and too often venerated, p value means is that it estimates the probability of getting roughly that result if the study hypothesis is assumed to be true. It can’t on its own tell you whether this assumption was right, or whether the results would hold true in different circumstances. It provides a limited picture of probability, taking limited information about the data into account and giving only “yes” or “no” as options.
If you are really interested in this topic you should read a bit about what’s called Bayesian statistics, named after an 18th century mathematician. The basic notion here is that we need to consider our prior knowledge about something before applying statistical tests, and that we should factor in this knowledge when we make our statistical comparisons. In other words, all possibilities are not intrinsically equal going into the analysis. The debates between Bayesian and what are termed “frequentist” statisticians go back and forth. But what we should take home from these debates is that the science of statistics, like other sciences, is subject to revision and change over time.
A final key point is to look at medical headlines of new medical breakthroughs and try to decide if the findings really are “significant” in real life. Is there only a tiny effect, really, even though the p value is “significant” at the 0.05 level? Also beware of what we call “data dredging,” in which multiple comparisons are made using the same data set. When you do that the chances of coming up with a significant, yet spurious association go up.
All of this has made some people call for some rudimentary statistical training to be part of the standard mathematics curriculum at the high school level. I think this is a good idea. I didn’t get introduced to any statistical concepts in high school, and I took all the math available. That should change if we expect the mass of our citizenry to be competent to judge things for themselves. Medical journalists definitely need this knowledge because currently many do a terrible job interpreting medical reports. The two articles I linked are a great place to start.