On Monday, March 7, 2016 the American Statistical Association posted an official statement on pvalues and statistical significance: The ASA’s statement on pvalues: context, process, and purpose. The statement is meant as guidance to practitioners of all kinds who use statistical significance tests. It is a loud proclamation of what statisticians have generally taught to their students and clients. But some ideas get lost in the actual practice of analysis and through practices that lose touch with ideas. Below are the six principles the statement outlines with my comments. But I encourage you to read the statement in full and the very informative preface on how the statement came into being. Here’s the setup, verbatim from the preface: In February, 2014, George Cobb, Professor Emeritus of Mathematics and Statistics at Mount Holyoke College, posed these questions to an ASA discussion forum: Q: Why do so many colleges and grad schools teach p = .05? A: Because that's still what the scientific community and journal editors use. Q: Why do so many people still use p = 0.05? A: Because that's what they were taught in college or grad school. Cobb’s concern was a longworrisome circularity in the sociology of science based on the use of bright lines such as P < 0.05: “We teach it because it’s what we do; we do it because it’s what we teach.” Think of a pvalue as the probability of the true value being more extreme than our observed value. The six principles: 1. Pvalues can indicate how incompatible the data are with a specified statistical model. When we choose a significance test we are often testing assumptions about the nature of the underlying data. For example in comparing two groups we are often assuming that both groups have the same, frequently Normal, distribution and so their means are likely to be the same. When our assumptions don’t fit—that is the data don’t fit the model—then we consider rejecting the null hypothesis or revisiting our assumptions. If the means of the two groups are different then we have to consider that our assumptions about one or both of our groups are incorrect. 2. Pvalues do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. Let me just quote the statement for this one: Researchers often wish to turn a pvalue into a statement about the truth of a null hypothesis, or about the probability that random chance produced the observed data. The pvalue is neither. It is a statement about data in relation to a specified hypothetical explanation, and is not a statement about the explanation itself. 3. Scientific conclusions and business or policy decisions should not be based only on whether a pvalue passes a specific threshold. Say it again: Scientific conclusions and business or policy decisions should not be based only on whether a pvalue passes a specific threshold! Statistics are frequently used to support our positions but we better have some other confirmatory evidence as well. 4. Proper inference requires full reporting and transparency Remember our focus is on pvalues—there are plenty of reasons for research transparency—the concern addressed is the use of multiple analyses and comparisons and only reporting certain results. “Cherrypicking … data dredging, significance chasing, significance questing, selective inference and “phacking” leads to a spurious excess of statistically significant results and should be avoided.” 5. A pvalue, or statistical significance, does not measure the size of an effect or the importance of a result. Our result is significant but does it matter? What about clinical significance, for example? And can we measure it? Fortunately the machine age lets us perform additional tests with little effort. We can learn to measure effect sizes, can create reliable change indices’ and do more with confidence intervals instead of relying on point estimates. 6. By itself, a pvalue does not provide a good measure of evidence regarding a model or hypothesis. The significance test can be informative and in can give us some idea of the strength of our hypothesis. But remember most of the time the test is set up so that we want to reject the null hypothesis. Rejecting the null hypothesis isn’t the same thing as confirming the truth of the alternative hypothesis. Once again, more evidence is required. That the ASA made this announcement is significant news for statistics, science and data science. The statement includes a great bibliography for those interested in reading more on this topic. We live in the era of big data and easy statistical computing. Far from being dead the old statistical methods are being deployed and redeployed far and fast and new ones are rising. This is a good thing and we shouldn’t continue to teach significance testing in introductory stats the same way nor accept them as sufficient and necessary evidence in applied statistics and science. Notes: Citation: Ronald L. Wasserstein & Nicole A. Lazar (2016): The ASA's statement on pvalues: context, process, and purpose, The American Statistician, DOI:10.1080/00031305.2016.1154108 Link to the article: http://dx.doi.org/10.1080/00031305.2016.1154108 See also: Harvard Educational Review, Vol 48, No 3, August 1978, The Case Against Statistical Significance Testing, Ronald P. Carver, University of MissouriKansas City Search We teach it because it’s what we do; we do it because it’s what we teach.
1 Comment
11/21/2016 05:24:50 pm
Statistics is a broad concept. There are a lot of invention right now for better acquisition of data. But there are still complications that need to be settle. In view of the prevalent misuses of and misconceptions concerning pvalues, some statisticians prefer to supplement or even replace pvalues with other approaches. These include methods that emphasize estimation over testing, such as confidence, credibility, or prediction intervals. Good statistical practice,emphasizes principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean.
Reply
Your comment will be posted after it is approved.
Leave a Reply. 
Author
Jesse Sharp is an expert in the analysis of health care data. Passionate about data and the ethics of analysis he writes on topics related to medicine, public health and statistics. More... Archives
November 2017
Categories
All
