DARPA Aims to Score Social and Behavioral Research
The Pentagon’s innovation incubator has set itself an ambitious task – ranking the reliability of social science research that might apply to national security. The Defense Advanced Research Projects Agency’s Defense Sciences Office is currently asking for “innovative research proposals” to algorithmically assign a confidence score to social and behavioral research.
DARPA has named this program to develop an artificially intelligent quantitative metric Systematizing Confidence in Open Research and Evidence, or SCORE. As DARPA explains in its request for proposals:
These tools will assign explainable confidence scores with a reliability that is equal to, or better than, the best current human expert methods. If successful, SCORE will enable [Department of Defense] personnel to quickly calibrate the level of confidence they should have in the reproducibility and replicability of a given SBS result or claim, and thereby increase the effective use of SBS literature and research to address important human domain challenges, such as enhancing deterrence, enabling stability, and reducing extremism.
Outside observers have identified a wider collateral benefit to the academy from the proposal – a tool to address the so-called replication crisis in social science. An article by Adam Rogers at Wired, for example, is headlined “Darpa Wants to Solve Science’s Reproducibility Crisis With AI.”
DARPA implies that the replication crisis is itself a national security concern: “Taken in the context of growing numbers of journals, articles, and preprints, this current state of affairs could result in an SBS consumer mistakenly over-relying on weak SBS research or dismissing strong SBS research entirely.”
Last month, DARPA signed the Center for Open Science (COS) to a three-year agreement, worth $7.6 million, to create a database of 30,000 claims made in peer-reviewed and published papers. Alongside partners from the University of Pennsylvania and Syracuse University, COS will extract – automatically and manually – evidence about the claims, which will be merged with more traditional quality indicators like citations and whether the research was preregistered.
Three steps will follow once the database exists:
- Experts will examine 10 percent of the claims, using surveys, panels and even prediction markets, for their likelihood of being replicated.
- Other experts will create algorithms to examine the database’s contents and determine, artificially, their likelihood of being replicated.
- Other researchers will attempt to replicate a sample of the database’s claims, allowing both the humans’ and the computers’ efforts to be measured and scored.
Appropriately, COS says its own work need to be reproducible. “We are committed to transparency of process and outcomes so that we are accountable to the research community to do the best job that we can,” said COS program manager Beatrix Arendt, “and so that all of our work can be scrutinized and reproduced for future research that will build on this work.”
“Whatever the outcome,” according to Brian Nosek, COS’ executive director, “we will learn a ton about the state of science and how we can improve.”
Rogers quote Microsoft sociologist Duncan Watts about the audacity of creating a scoring mechanism: “It’s such a DARPA thing to do, where they’re like, ‘We’re DARPA, we can just blaze in there and do this super-hard thing that nobody else has even thought about touching.’” Watts then adds, ““Good for them, man.” (Further demonstrating its chutzpah, DARPA has specifically excluded from SCORE proposals “research that primarily results in evolutionary improvements to the existing state of practice.”)
Ideally the scores and how they were determined would be understandable to a non-specialist. In addition, the scores could change based on new information.
As it tries to grade social and behavioral research, DARPA clearly acknowledges the need to fully embrace social science. “Given the accelerating sociotechnical complexity of today’s world—a world that is increasingly connected but often poorly understood—there are growing calls to more effectively leverage Social and Behavioral Sciences (SBS) to help address critical complex national security challenges in the Human Domain,” DARPA wrote in a 41-page document announcing the program in June 2018.
In addition to citing work that has obvious applications to security, such as reducing extremism, the documents cited other federal projects that have explicitly connected SBS and the Pentagon, such as the National Academies of Science’s Decadal Survey of Social and Behavioral Sciences for Applications to National Security and the Minerva Research Initiative (“Supporting social science for a safer world”).