International Debate

What’s That? The Replication Crisis is Good for Science?

April 10, 2019 2228

Cris question mark
(Image:Gerd Altmann from Pixabay)

Science is in the midst of a crisis: A surprising fraction of published studies fail to replicate when the procedures are repeated.

For example, take the study, published in 2007, that claimed that tricky math problems requiring careful thought are easier to solve when presented in a fuzzy font. When researchers found in a small study that using a fuzzy font improved performance accuracy, it supported a claim that encountering perceptual challenges could induce people to reflect more carefully.

However, 16 attempts to replicate the result failed, definitively demonstrating that the original claim was erroneous. Plotted together on a graph, the studies formed a perfect bell curve centered around zero effect. As is frequently the case with failures to replicate, of the 17 total attempts, the original had both the smallest sample size and the most extreme result.

The Conversation logo
This article by Eric Loken originally appeared at The Conversation, a Social Science Space partner site, under the title “The replication crisis is good for science”

The Reproducibility Project, a collaboration of 270 psychologists, has attempted to replicate 100 psychology studies, while a 2018 report examined studies published in the prestigious scholarly journals Nature and Science between 2010 and 2015. These efforts find that about two-thirds of studies do replicate to some degree, but that the strength of the findings is often weaker than originally claimed.

Is this bad for science? It’s certainly uncomfortable for many scientists whose work gets undercut, and the rate of failures may currently be unacceptably high. But, as a psychologist and a statistician, I believe confronting the replication crisis is good for science as a whole.

Practicing good science

First, these replication attempts are examples of good science operating as it should. They are focused applications of the scientific method, careful experimentation and observation in the pursuit of reproducible results.

Many people incorrectly assume that, due to the “p<.05” threshold for statistical significance, only 5 percent of discoveries will prove to be errors. However, 15 years ago, physician John Ioannidis pointed to some fallacies in that assumption, arguing that false discoveries made up the majority of the published literature. Replication efforts are confirming that the false discovery rate is much higher than 5 percent.

Awareness about the replication crisis appears to be promoting better behavior among scientists. Twenty years ago, the cycle for publication was basically complete after a scientist convinced three reviewers and an editor that the work was sound. Yes, the published research would become part of the literature, and therefore open to review – but that was a slow-moving process.

Today, the stakes have been raised for researchers. They know that there’s the possibility that their study might be reviewed by thousands of opinionated commenters on the internet or by a high-profile group like the Reproducibility Project. Some journals now require scientists to make their data and computer code available, which makes it likelier that others will catch errors in their work. What’s more, some scientists can now “preregister” their hypotheses before starting their study – the equivalent of calling your shot before you take it.

Combined with open sharing of materials and data, preregistration improves the transparency and reproducibility of science, hopefully ensuring that a smaller fraction of future studies will fail to replicate.

While there are signs that scientists are indeed reforming their ways, there is still a long way to go. Out of the 1,500 accepted presentations at the annual meeting for the Society for Behavioral Medicine in March, only one in four of the authors reported using these open science techniques in the work they presented.

Improving statistical intuition

Finally, the replication crisis is helping improve scientists’ intuitions about statistical inference.

Researchers now better understand how weak designs with high uncertainty – in combination with choosing to publish only when results are statistically significant – produce exaggerated results. In fact, it is one of the reasons more than 800 scientists recently argued in favor of abandoning statistical significance testing.

We also better appreciate how isolated research findings fit into the broader pattern of results. In another study, Ionnadis and oncologist Jonathan Schoenfeld surveyed the epidemiology literature for studies associating 40 common food ingredients with cancer. There were some broad consistent trends – unsurprisingly, bacon, salt and sugar are never found to be protective against cancer.

But plotting the effects from 264 studies produced a confusing pattern. The magnitudes of the reported effects were highly variable. In other words, one study might say that a given ingredient was very bad for you, while another might conclude that the harms were small. In many cases, the studies even disagreed on whether a given ingredient was harmful or beneficial.

Each of the studies had at some point been reported in isolation in a newspaper or a website as the latest finding in health and nutrition. But taken as a whole, the evidence from all the studies was not nearly as definitive as each single study may have appeared.

Schoenfeld and Ioannidis also graphed the 264 published effect sizes. Unlike the fuzzy font replications, their graph of published effects looked like the tails of a bell curve. It was centered at zero with all the nonsignificant findings carved out. The unmistakable impression from seeing all the published nutrition results presented at once is that many of them might be like the fuzzy font result – impressive in isolation, but anomalous under replication.

The breathtaking possibility that a large fraction of published research findings might just be serendipitous is exactly why people speak of the replication crisis. But it’s not really a scientific crisis, because the awareness is bringing improvements in research practice, new understandings about statistical inference and an appreciation that isolated findings must be interpreted as part of a larger pattern.

Rather than undermining science, I feel that this is reaffirming the best practices of the scientific method.


Eric Loken is an associate professor in the Department of Educational Psychology, affiliated with the Measurement, Evaluation and Assessment program, at the University of Connecticut. His interests focus on latent variable models, Bayesian inference, and methods for reproducible science. He received his Ph.D. from Harvard University and studies advanced statistical modeling with applications to large scale educational testing.

View all posts by Eric Loken

Related Articles

Megan Stevenson on Why Interventions in the Criminal Justice System Don’t Work
Social Science Bites
July 1, 2024

Megan Stevenson on Why Interventions in the Criminal Justice System Don’t Work

Read Now
How ‘Dad Jokes’ Help Children Learn How To Handle Embarrassment
Insights
June 14, 2024

How ‘Dad Jokes’ Help Children Learn How To Handle Embarrassment

Read Now
How Social Science Can Hurt Those It Loves
Ethics
June 4, 2024

How Social Science Can Hurt Those It Loves

Read Now
Digital Scholarly Records are Facing New Risks
Research
May 21, 2024

Digital Scholarly Records are Facing New Risks

Read Now
Analyzing the Impact: Social Media and Mental Health 

Analyzing the Impact: Social Media and Mental Health 

The social and behavioral sciences supply evidence-based research that enables us to make sense of the shifting online landscape pertaining to mental health. We’ll explore three freely accessible articles (listed below) that give us a fuller picture on how TikTok, Instagram, Snapchat, and online forums affect mental health. 

Read Now
New Fellowship for Community-Led Development Research of Latin America and the Caribbean Now Open

New Fellowship for Community-Led Development Research of Latin America and the Caribbean Now Open

Thanks to a collaboration between the Inter-American Foundation (IAF) and the Social Science Research Council (SSRC), applications are now being accepted for […]

Read Now
The Long Arm of Criminality

The Long Arm of Criminality

David Canter considers the daily reminders of details of our actions that have been caused by criminality.

Read Now
0 0 votes
Article Rating
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments