On Monday, March 7, 2016 the American Statistical Association posted an official statement on p-values and statistical significance: The ASA’s statement on p-values: context, process, and purpose. The statement is meant as guidance to practitioners of all kinds who use statistical significance tests. It is a loud proclamation of what statisticians have generally taught to their students and clients. But some ideas get lost in the actual practice of analysis and through practices that lose touch with ideas.
Below are the six principles the statement outlines with my comments. But I encourage you to read the statement in full and the very informative preface on how the statement came into being.
Here’s the set-up, verbatim from the preface:
In February, 2014, George Cobb, Professor Emeritus of Mathematics and Statistics at Mount Holyoke College, posed these questions to an ASA discussion forum:
Q: Why do so many colleges and grad schools teach p = .05?
A: Because that's still what the scientific community and journal editors use.
Q: Why do so many people still use p = 0.05?
A: Because that's what they were taught in college or grad school.
Cobb’s concern was a long-worrisome circularity in the sociology of science based on the use of bright lines such as P < 0.05: “We teach it because it’s what we do; we do it because it’s what we teach.”
Think of a p-value as the probability of the true value being more extreme than our observed value.
The six principles:
1. P-values can indicate how incompatible the data are with a specified statistical model.
When we choose a significance test we are often testing assumptions about the nature of the underlying data. For example in comparing two groups we are often assuming that both groups have the same, frequently Normal, distribution and so their means are likely to be the same.
When our assumptions don’t fit—that is the data don’t fit the model—then we consider rejecting the null hypothesis or revisiting our assumptions. If the means of the two groups are different then we have to consider that our assumptions about one or both of our groups are incorrect.
2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
Let me just quote the statement for this one:
Researchers often wish to turn a p-value into a statement about the truth of a null hypothesis, or about the probability that random chance produced the observed data. The p-value is neither. It is a statement about data in relation to a specified hypothetical explanation, and is not a statement about the explanation itself.
3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
Say it again: Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold! Statistics are frequently used to support our positions but we better have some other confirmatory evidence as well.
4. Proper inference requires full reporting and transparency
Remember our focus is on p-values—there are plenty of reasons for research transparency—the concern addressed is the use of multiple analyses and comparisons and only reporting certain results. “Cherry-picking … data dredging, significance chasing, significance questing, selective inference and “p-hacking” leads to a spurious excess of statistically significant results and should be avoided.”
5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
Our result is significant but does it matter? What about clinical significance, for example? And can we measure it? Fortunately the machine age lets us perform additional tests with little effort. We can learn to measure effect sizes, can create reliable change indices’ and do more with confidence intervals instead of relying on point estimates.
6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
The significance test can be informative and in can give us some idea of the strength of our hypothesis. But remember most of the time the test is set up so that we want to reject the null hypothesis. Rejecting the null hypothesis isn’t the same thing as confirming the truth of the alternative hypothesis. Once again, more evidence is required.
That the ASA made this announcement is significant news for statistics, science and data science. The statement includes a great bibliography for those interested in reading more on this topic. We live in the era of big data and easy statistical computing. Far from being dead the old statistical methods are being deployed and redeployed far and fast and new ones are rising. This is a good thing and we shouldn’t continue to teach significance testing in introductory stats the same way nor accept them as sufficient and necessary evidence in applied statistics and science.
Ronald L. Wasserstein & Nicole A. Lazar (2016): The ASA's
statement on p-values: context, process, and purpose, The American Statistician, DOI:10.1080/00031305.2016.1154108
Link to the article: http://dx.doi.org/10.1080/00031305.2016.1154108
Harvard Educational Review, Vol 48, No 3, August 1978, The Case Against Statistical Significance Testing, Ronald P. Carver, University of Missouri-Kansas City
We teach it because it’s what we do; we do it because it’s what we teach.
Jesse Sharp is an expert in the analysis of health care data. Passionate about data and the ethics of analysis he writes on topics related to medicine, public health and statistics. More...