Free the Data to Move Towards Open Science
Open science can be incorporated into every step of the scientific process and emphasizes data sharing. Making data publicly available facilitates their reuse by scientists, such as in synthesis research, and can thus have a much greater impact than data that are limited to the creator’s initial analysis or intention.
With billions of dollars dedicated each year to support scientific research, there is a growing push from funders to increase the impact and prestige of the money they award by requiring or encouraging data sharing. Particularly, when scientists receive public funds, their data are considered a public good and therefore carry an expectation of public accessibility.
Additionally, new tools are emerging that make data annotation and sharing easier to incorporate into the research process.
However, while tools and protocols are changing to improve data sharing among researchers, colleagues and I found little data are currently made public in practice.
In an article published in PLOS One, our team of scientists tested compliance with funder-imposed data-sharing requirements among projects in the environmental sciences over a twenty-year period. We were able to collect data from only 26 percent of the funded projects.
As scientists, we believe everyone in the scientific community can play a role in increasing data publication and sharing, and it is our responsibility to do so to improve the efficiency of research.
In our analysis, data availability did differ based on the project’s field of study, influenced by factors such as the time required to prepare data, whether a field has established data collection protocols and standardized methods, the sensitivity of data, and the ease of their interpretation.
Nonetheless, we assert that a fundamental obstacle facing data sharing is the absence of a professional reward structure, such as the recognition that data citations are as valuable as paper citations. This discrepancy de-incentivizes the time spent formatting, annotating, and preparing data to be shared.
While some publication platforms are starting to apply digital object identifiers (DOIs) to published data as a reliable way to enable attribution, similar to journal publications, it is ultimately up to the scientific community to recognize data citations as scientific currency that is equally valuable, and to encourage and practice the inclusion of data citations in their overall scientific output.
To move toward more open science, scientists must take on some of the responsibility of learning about the benefits of data sharing and incorporating open science methods into their daily work. Creating data in a way that others – and a future you – can access and interpret them easily may require an extra initial step, but it will decrease additional work down the road.
Using data formats that are easy to share and read on multiple and open source platforms – for example, CSV files rather than MS Excel – and publishing data in open archives will also save time when other researchers or the funder request data. Refined data preparation protocols can also expedite the publication process, as many journals, similar to funders, now require proof of data publication.
Funders can also make changes that will incentivize data sharing. Many have long required their awardees to make data publicly available without following up on these requirements or providing any resources to help the researchers preserve their data.
Some funders, such as the National Science Foundation (NSF), are starting to ensure data-sharing compliance by creating dedicated data archives for the projects they fund and provide technological assistance to awardees. For example, the Arctic Data Center houses all of the data about the Arctic collected under NSF grants and provides awardees with a team of technicians to assist with data attribution, metadata creation, formatting, and publication. NSF also requires funded Arctic researchers to publish their data in the archive, or prove their publication in a similar archive, before awarding further funding.
This two-fold approach not only facilitates data publication but also provides funders an easy confirmation of data sharing compliance.
Data sharing is pivotal to ensuring open science and research efficiency. In the ways outlined above, scientists, funders, and publishers alike can play important roles in increasing data liberation.
Thinking about data as a valuable scientific currency is an important step forward, and it requires support from the entire scientific community. It starts with how you think about and treat yours and other people’s data.