As We Evolve Science Policy We Must Vigilantly Assess Its Changes
The U.S. National Institutes of Health (NIH), the largest public science funder in the world, has recently sought commentary on several substantial changes to its science policies. Most recently, NIH called for commentary on proposed changes to its peer-review process for fellowships, having previously called for comment on its general grant peer-review policies, and on its open-access policies. The proposed changes in policy, and NIH’s responses to previous commentary, avoid key questions of how peer review should be evaluated and improved. Moreover, these proposed NIH policies underscore an endemic gap in evidence and the need for a systematic – scientific – approach to what we call “operational science policy,” the policies and practices that effectively control the conduct and communication of formal, natural, and social sciences.
Operational science policies generally, and peer-review processes in particular, affect not only sciences’ impact and cost, but also equity. For example, an independent cross-sectional analysis of NIH grant funding over the last 30 years, just published by JAMA, shows that there are persistent gender, race, and ethnic inequities in the awarding of NIH grants, even after adjusting for the career stage and degree of the investigator. Further, a randomized experimental study conducted by NIH’s own Center for Scientific Review, published in 2021, demonstrated that double-blinding the review process would reduce that bias by half.
None of this research, or other research in peer review, is reflected in the proposed changes. Instead, the NIH proposed to change the grouping and granularity of their current scoring criteria, asserting that these will reduce reputational bias and enable reviewers to focus on scientific merit. While these modest changes are not necessarily bad, the NIH’s request for information has no discussion of the rationale for them, the relationship between the stated goals and previously highlighted inequities in funding, and how progress towards any of these goals eventually will be measured.
In March, NIH received 780 comments on its proposed changes to peer review. However, neither NIH’s response, nor its subsequent proposal for changes to fellowship review suggest that it recognizes the need for scientific evaluation of its policies, nor that it plans to update the proposed policies to address any issues raised in the comment.
Under-evaluation of science policy is endemic
NIH is not alone in initiating major changes to operational science policy without concurrent plans to study the effect of those changes. The absence of evaluation is widespread. For example, a new European Commission report on the deployment of open science stresses monitoring open science as an outcome, but does not include any evaluation of the effect of open science in moving the research enterprise toward its goals. More broadly, the Reimagine Review registry now describes 62 initiatives to change peer review for publication. Fewer than 12 percent included any metrics or plans for evaluation (we calculated the inspection by reviewing each entry in the catalog).
The integrity and effectiveness of research depend fundamentally on peer review and other operational science policies. The last decade has seen an acceleration in the rate of initiatives to innovate in how science operates, particularly concerning peer review. There is much scientific uncertainty about the effectiveness and mechanisms of peer review and other science practices. However, making progress will require that we thoughtfully design operational science policies to enable assessment and the accumulation of knowledge.
The pressure for change in operational science policies only continues to increase. As a result of the Nelson memo issued by the Office of Science and Technology policy last summer, U.S. federal agencies are now under pressure to make changes to policies on open access, open data, research integrity, and equitable access. These goals are laudable, and changes in policy are needed to achieve them, but scientific assessment should be an integral part of the changes, not an afterthought.
Funders should support more research in metascience
Assessment requires resources and attention, yet funding for research to assess such initiatives and shape their future direction has lagged. The largest standing federal research program for science-of-science research is the National Science Foundation’s (NSF) Science of Science Program, which funded just over $4.5 million in new research in 2022. While it is good that NSF takes the lead among federal funders in this area, this program funding represents about .05 percent of that agency’s annual budget and NSF’s annual budget represents only a fraction of the total federal scientific research budget. NIH, which has an annual budget over six times that of NSF, currently has no comparable program for metascience.
Much of the progress in operational science policy has been dependent on the generosity of independent organizations, nonprofits, foundations, and universities, such as Arnold Ventures, the Chan-Zuckerberg Initiative, the Gordon and Betty Moore Foundation, Howard Hughes Medical Institute, Mellon Foundation, and the Sloan Foundation. Relying on the generosity of philanthropic organizations to support the science of science is not sufficient in the short term or sustainable in the long term and some early leaders in these areas have shifted to other agendas. Instead of eliding the central questions of what peer review should and does achieve, NIH and other funding agencies could set a standard by designing policy changes with evaluation in mind, and then supporting those efforts.
Changes in science policies and practices should be treated, at least in part, as experiments. New policies and practices should be designed to add incrementally to our cumulative knowledge, of how, and how well, science works. Rigorous evaluation of operational science policies should include identifying key elements of practices and measures of outcomes that can be used and compared across programs. Such interventions should be reproducible, with their implementation details shared to allow independent analysis and replication. And interventions should produce FAIR open data whenever possible, so that the scientific community can evaluate the outcomes of the policy. Science funders should also support complementary independent research in metascience, including experimental approaches, data-sharing infrastructure, evaluation methods, and theory development in the science of science, that would inform operational science across the field.
Operational policies are at the heart of science. While many policies for grant review, scholarly publication, and research transparency have served science and society well, there is an urgent need to update these policies and practices. However, new policies should integrate experimentation and evaluation. We should not expect publication to be perfect, nor that changes to science policy be guaranteed to succeed. Many questions about the design and effects of review and other operational science policies remain open. The scientific community should expect that when stakeholders such as funders and publishers make changes to operational policies, they build on scientific research, communicate their aims and rationale, and, most importantly, include mechanisms enabling the community to understand the change’s impact and accumulate knowledge about what works and what does not.