Research

What Does It Mean If the Public is Decent at Predicting Replication?

October 5, 2020 2843

With replication – and concerns about the lack of it – occupying much of the discussion about social and behavioral research, efforts to predict what research will render similar findings when repeated have also gained attention. Several notable efforts to either replicate or predict what will be replicable have arisen in the wake of what’s termed a “replication crisis,” with perhaps the highest-profile effort a quantitative effort by the U.S. military to determine in advance what research will be reliable by using replicability as a proxy.

But could an average person could just look at a project and intuit whether it would replicate or not? That was something that Suzanne Hoogeveen and Alexandra Sarafoglou, colleagues in the Department of Psychology at the University of Amsterdam, found some evidence for in new research which they and fellow Amsterdam colleague Eric-Jan Wagenmakers published in Advances in Methods and Practices in Psychological Science. Hoogeveen and Sarafoglou answered some questions social Science Space posed about their paper, “Laypeople Can Predict Which Social-Science Studies Will Be Replicated Successfully,” and that interview appears below the paper’s abstract.

Large-scale collaborative projects recently demonstrated that several key findings from the social-science literature could not be replicated successfully. Here, we assess the extent to which a finding’s replication success relates to its intuitive plausibility. Each of 27 high-profile social-science findings was evaluated by 233 people without a Ph.D. in psychology. Results showed that these laypeople predicted replication success with above-chance accuracy (i.e., 59%). In addition, when participants were informed about the strength of evidence from the original studies, this boosted their prediction performance to 67%. We discuss the prediction patterns and apply signal detection theory to disentangle detection ability from response bias. Our study suggests that laypeople’s predictions contain useful information for assessing the probability that a given finding will be replicated successfully.

There’s often a tension between the lay public and specialists in any realm that research findings are either obvious in retrospect or are ridiculous on their face. What do feel your work tells us about how we should approach this ‘tyranny of common sense’ in the social sciences?

To some extent, our study indicates that researchers in the social sciences have been neglecting the aspect of common sense in the past decades. Research in the social sciences have put extreme, ‘sexy’ effects too much into the spotlight and yet precisely these effects have turned out not to replicate and were in fact also not considered plausible to many scientists and non-scientists alike.

Of course, common sense cannot replace scientific evidence; in science we need evidence to substantiate our claims, the mere fact that something sounds plausible is obviously not sufficient. As our study has shown: the predictions derived from common sense are not always correct.

At the same time, we should not ignore the information we can derive from common sense. As we mention in the paper, researchers could use laypeople’s intuitions to inform their decisions and anticipate the likelihood of finding a reliable effect. Common sense could, for instance, serve as a red flag; if laypeople think an effect is extremely unlikely to replicate, as researchers we should make sure we collect a lot of solid evidence to shift their beliefs and convince them otherwise.

What do you think intuition and a priori assumptions play in setting the larger research agenda, for good and for bad?

One of the main reasons for the replication crisis in psychology and the social sciences in general is its ‘publish or perish’ culture. To increase the chances of one’s research being published in a high-impact journal, researchers must deliver research findings that are surprising and ‘sexy’ and that one can make a good convincing story around.

Suzanne Hoogeveen, left, and Alexandra Sarafoglou

However, we are now learning that many of those effects that sounded too good to be true were probably indeed too good to be true – in other words: not replicable. An infamous example from psychology is a study that found that people became less religious after looking at a picture of Rodin’s The Thinker, because they were primed to think analytically, instead of intuitively. In retrospect this may sound ridiculous, and indeed the authors themselves have acknowledged that the study was ‘silly’. But note that the fact that the study did not replicate, does not mean that the theory suggesting an association between religiosity and analytical thinking is invalid.

Of course, researchers still want to conduct relevant and impactful studies. Yet the field has made a considerable effort to distance itself from the culture of preferring creativity and novelty over robustness. In general, over the past decade the sample sizes in our field increased, it is more accepted to conduct and publish replication studies, and we argue that also editors and reviewers endorse rigor and transparency now more than they did a decade ago. Also funding agencies have shifted their focus; a lot more money goes to large collaborative replication projects that incorporate open science principles (e.g., Many Labs) and platforms that promote open science (e.g., the Center for Open Science; COS and the Open Science Framework; OSF).

I found this line in your paper notable: “in the context of the bleak reality of the current replication rate in psychological science, laypeople are optimists.” I wonder if you might flesh out your comments about what researchers could do to recraft their analysis frameworks, whether in a Bayesian setting or outside of one, to perhaps warrant future optimism.

There are two ways of looking at this observation: yes, in comparison to the low replication rates of around 50-60 percent reported in the replication projects, laypeople overestimated replicability of findings. But that does not mean that 50-60 percent of all findings in social science are false or unreplicable. So we are actually optimistic that, as a result of the rapid changes that have been adopted in the social sciences, perhaps laypeople’s prediction of around 75 percent successful replications is in fact more realistic right now.

In any case, an important way to mitigate this ‘bleak reality of the current replication rate’ is simply being more cautious. Therefore, some have argued to use alpha < .005 instead of .05 for claiming significance (hence the presence of an effect). In a Bayesian framework, incorporating caution is straightforward: researchers could adjust the prior odds for their hypotheses based on their a priori plausibility. Based on the aphorism that “extraordinary claims require extraordinary evidence,” researchers would need to collect more evidence to shift conviction when prior odds are low.

In a broader sense, increasing the replication rates in social science also means that we just try do better science and to adhere to stricter methodological requirements such as using large samples, preregistering our studies, as well as use strong theories.

Should your results offer any cautionary lessons for an academic publishing culture that rewards novelty? Or might your results nudge us toward more incremental and safe investigation?

We think our study should reinforce the need for a culture change, but we’d like to stress that this change in culture is already happening as we speak; and that we are obviously not the first ones to note these problems. Rather, our study may present a telling illustration of what the field has been discussing over the past decade, and the changes that have been made as a result (e.g., open science badges, registered reports that shift weight from ‘interesting’ results to solid research questions and designs).

Could you talk about any mechanisms that might make use of your findings about ‘useful information’? And how might they impact both replication market efforts and more algorithmic efforts like the U.S. military research agency’s SCORE project?

In a practical sense, laypeople’s intuitions may be used in the selection process for replication studies. Depending on the goal of the replicator, laypeople’s overall belief/skepticism or ambiguity could help to single out studies that are either most/least likely to replicate or are ambiguous. For the ambitious SCORE project, for instance, there was a very large initial sample of studies that may be selected for replication. If the goal is to estimate the overall replication rate in psychological science, randomly selecting studies would be a good strategy, but if you want to have some informed selection system, eliciting laypeople’s predictions may be useful.

For replication markets and surveys that are now becoming more popular as part of replication efforts, our study suggests that laypeople’s predictions could also be used to complement experts’ predictions. The replication markets platform, for instance, also invites non-researchers to participate in prediction markets.

What’s next?

Relatedly, we think our paradigm of eliciting predictions about the plausibility of research claims may prove a useful tool to quickly generate a lot of predictions, in addition to prediction markets and expert ratings.

For Bayesian inference, it may directly be used for individual studies, in which researchers can adjust prior odds in favor of a hypothesis based on laypeople’s predictions.

Most of all, we hope that our findings add to the many great efforts to map out the current status of the field and to convince the scientific community of the need to make some changes in the way we conduct research. Luckily, we think this is indeed increasingly widely acknowledged and many great tools and methods have been offered to take the necessary steps to improve social science research.