Open Access

Who Actually Makes Use of Open Access Research? We Looked at US National Academies Reports

By Ameet Doshi, Diana Hicks, Matteo Zullo and Omar I. Asensio

May 18, 2022 1633

A central argument in favor of open access is the claim that the public benefits from having direct access to research. Beginning with the earliest open access manifestos, the Budapest Open Access Initiative (2002), the Berlin Declaration on Open Access to Science and the Humanities (2003) and the Bethesda Statement on Open Access Publishing (2003), OA adherents advanced their argument based on first principles: that the public has an inherent right to publicly-funded research. Most of these manifestos explicitly include non-researchers and the lay public as potential intended audiences for open access literature.

Yet, beyond invocations of noblesse oblige to “wider society” and utopian hopes to feed “curious minds,” the focus of OA conventions and manifestos largely ignore the nature of use by the general public. Instead, these declarations functioned as statements-of-intent prompting action to expand no paywall access to research. Even when detail is provided, the imagined uses of open access materials often remain within the research realm: under-resourced scholars operating in the Global South, for instance, or to speed the pace of discovery and innovation within the triple-helix of university, industry and government sectors. While these are undoubtedly valid justifications for expanding access to research, left out of these potential user communities are the dark universe of people who are not research scientists or academic scholars.

This article by Ameet Doshi, Diana Hicks, Matteo Zullo and Omar I. Asensio originally appeared on the LSE Impact of Social Sciences blog as “Who Uses Open Access Research? Evidence from the use of US National Academies Reports” and is reposted under the Creative Commons license (CC BY 3.0).

The open access community has heretofore largely focused on overcoming the economic, legalistic and technological hurdles to create sustainable pathways to research. However, understanding and using scholarly research is non-trivial. Reading scholarly work more often than not requires specialized grounding in disciplinary concepts in order to parse the language of the domain. Is someone, who may not have strong grounding in the language and theory of a subdiscipline, willing to take the time and effort to overcome those barriers? Furthermore, the shift towards open access comes with significant costs to institutions and authors, as well as risks for smaller non-profit publishers. Is the global movement towards OA worth the risk to the established edifice of scholarly publishing and, by extension, to the advance of science itself? Specifically, what are the returns that accrue to society for moving publications to the open access model? It has now been twenty years since the Budapest declaration. What can we actually say about the public benefits of a more open scientific publishing ecosystem?

To help answer these questions we analyzed data from the US National Academies of Sciences, Engineering and Medicine (NASEM) by classifying 1.6 million US-based comments about how NASEM’s consensus study reports are used by the public. NASEM’s reports consist of authoritative, independently researched, consensus-based analyses on policy issues across domains. Since Abraham Lincoln first chartered the National Academy of Sciences in 1863, NASEM’s consensus study reports have served as influential scientific evidence for policymakers. The most downloaded reports are built on social science expertise in education and policy, in addition to medical knowledge. All consensus reports were made open access in 2011, and downloaders are prompted with a request to “please take a moment and tell us how you will be using this PDF.” The paper applies deep learning and natural language processing to label over a million comments, a task which would have otherwise required an inordinate amount of time and resources to accurately annotate the data. The deep learning neural network classifier implemented is Google’s BERT, a transformer-based classifier, which uses bidirectional training based on the well-known attention mechanism to overcome limitations of one-directional approaches commonly used for text classification.

Our classification project reveals that the impact of these reports extend far beyond the research community (see figure). We find that half of all report downloads are used for non-academic purposes, including to improve the provision of services by medical professionals, local and regional planners, public health workers, and veterans’ advocates, to name just a few of the 64 total categories of report use. Heavy use is made of NASEM reports on STEM education and how people learn by teachers, school administrators and teachers’ coaches. Other notable reports with their prominent users included Dying in America (chaplains), Nutrient Requirements for Beef Cattle (farmers), and Best Care at Lower Costs (clinicians and hospital administrators).

Concentric circles graphic showing who requested NASEM reports — Right click image to see larger version

This picture suggests that taxpayer investments in open access to high-quality science do indeed pay dividends to society, broadly and at the local service level. The results also indicate a public motivated to improve their engagement with patients, students, clients, and fellow citizens, and seek out (and share) the best available evidence to solve problems at the coalface. This motivation by non-researchers to use and apply consensus-based research appears to overcome the challenge of parsing specialist jargon in technical writing. This finding also contrasts with the contemporary notion of a public completely misinformed by social media, though we do not dispute the very real issues surrounding social media manipulation.

the authors, clockwise from top left: meet Doshi, Diana Hicks, Omar I. Asensio and Matteo Zullo

Additionally, we detect signals of “serious leisure” in the NASEM data, comprising about 4,300 comments. Serious leisure is a sociological concept introduced by Robert Stebbins to describe unpaid activities by individuals who engage in a systematic, self-directed pursuit of knowledge. The serious leisure devotee aims to continually expand understanding of their respective domains. These people downloaded reports relevant to wild edible plants (Lost Crops of the Incas: Little-Known Plants of the Andes with Promise for Worldwide Cultivation), astronomy (New Worlds, New Horizons in Astronomy and Astrophysics), and ham radio (Handbook of Frequency Allocations and Spectrum Protection for Scientific Uses).

The implications of this work are far-reaching. On the methodological side, the paper demonstrates the ability of machine learning techniques to enhance social science research and generate insights at scale. The techniques continue to improve, enhancing their precision and promising to exceed human ability to consistently make the subtle distinctions necessary to classify very large amounts of text for research purposes., members of the research team have been expanding the application of transformer-based algorithms into other social science areas, including understanding consumer behavior at scale with electric vehicle charging and smart meters.

Open access repositories require significant resources, both technological and human, to sustain and innovate. The National Academies Press, for example, has developed an engaging user interface to incentivize browsing and ease of access to NASEM publications. The PubMed Central server, developed and managed by the US National Institutes of Health (NIH), requires millions of dollars per year to operate. Our research indicates there is an identifiable payoff to society for these taxpayer investments into people, technology and design to support OA publishing.

As we note in the paper, “[o]ur results establish the existence of demand for high-quality information by the public and that such knowledge is widely deployed to improve provision of services. Knowing the importance of such information, policy makers can be encouraged to protect it.” Librarians and open access advocates have long presupposed that open access to high-quality scientific knowledge could and should be viewed as a public good. Our empirical research suggests that the initial utopian aspirations regarding the public use and societal impact of OA may indeed rest on sound footing.

DRAWN FROM

This post draws on the authors’ article, “Widespread use of National Academies Consensus Reports by the American Public,” published in Proceedings of the National Academies.

Ameet Doshi, Diana Hicks, Matteo Zullo and Omar I. Asensio

Ameet Doshi is head of the Donald E. Stokes Library at Princeton University and a doctoral student in the School of Public Policy at the Georgia Institute of Technology. Doshi’s research focuses on how non-scientists use open access research, and he has served on the American Library Association's Center for the Future of Libraries advisory board. Diana Hicks is a professor in the School of Public Policy, Georgia Institute of Technology specializing in metrics for science and technology policy. She was the first author on the Leiden Manifesto for research metrics published in Nature,. She co-chairs the biennial international Atlanta Conference on Science and Innovation Policy. Matteo Zullo is a PhD candidate at the Georgia Institute of Technology focusing on educational analytics, science & technology policy, standardized testing, and AI. Omar I. Asensio is an assistant professor in the School of Public Policy at the Georgia Institute of Technology and director of the Data Science & Policy lab. His research focuses on the intersection of big data and public policy, with applications to energy systems and consumer behavior, digital innovation, smart cities, resource conservation and machine learning in transportation and electric mobility.

View all posts by Ameet Doshi, Diana Hicks, Matteo Zullo and Omar I. Asensio

Published

May 18, 2022

From the University to the Edu-Factory: Understanding the Crisis of Higher Education

By Daniel Nehring

Read Now

Exploring the Citation Nexus of Life Sciences and Social Sciences

Industry

November 6, 2024

Exploring the Citation Nexus of Life Sciences and Social Sciences

By Hongyu Zhou, Lin Zhang, Beibei Sun, Raf Guns, Tim Engels and Ying Huang

Read Now

Revisiting the ‘Research Parasite’ Debate in the Age of AI

International Debate

September 11, 2024

Revisiting the ‘Research Parasite’ Debate in the Age of AI

By C. Brandon Ogbunu

Read Now

This Anthropology Course Looks at Built Environment From Animal Perspective

Industry

September 10, 2024

This Anthropology Course Looks at Built Environment From Animal Perspective

By Richard Fadok

Read Now

The Public’s Statistics Should Serve, Well, the Public

Paul Allin 2501 Industry, International Debate, Public Policy

Paul Allin sets out why the UK’s Royal Statistical Society is launching a new campaign for public statistics.

Read Now

Where Did We Get the Phrase ‘Publish or Perish’?

Vladimir M. Moskovkin 1796 Career, Communication, Industry

The origin of the phrase “publish or perish” has been intriguing since this question was first raised by Eugene Garfield in 1996. Vladimir Moskovkinl talks about the evolution of the meaning of this phrase and shows the earliest use known at this point.

Read Now