International Debate

Good Replication Standards Start With the Data

August 24, 2016 1703

How can we create reliable and replicable political science data? A recent article in the American Political Science Review focuses on text analysis and suggests ways to make these data sound and reproducible.

Political Science Replication logo

This post by Nicole Janz originally appeared at her blog, Political Science Replication, under the title, “Towards A More Comprehensive Replication Standard In Political Science: Reproducible Data Collection.” It is reposted here with her permission

Kenneth Benoit from the London School of Economics and his co-authors write in their article “Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data” that there is now a strong trend in political science, where journals ask authors to upload data and software codes for replication. However, there is still uncertainty on how to make the datacollection itself reproducible.

Most guidelines by journals require files so that replicators can reanalyze the given dataset and run or improve the software code. However, such an replication of an analysis “sets a far weaker standard than reproducibility of the data”. The authors propose that a “more comprehensive” replication standard should involve replication of the data collection and production itself.

This is an excellent point. I have discussed earlier on my blog that good practice for data collection entails to keep detailed logs about the sources and all procedures and decisions – including selection, merging, transforming and cleaning raw data. Without good practice in data collection, a replicator may be able to reproduce the analysis itself with an uploaded data set by the authors – but it may be impossible to follow the data creation process.

In their article, the authors show how reproducible data collection can be implemented for scholars engaged deriving data from of political texts. Most scholars in political science have come across and used such data in their training or research. I work a lot with human rights measurements such as the Political Terror Scale, which creates an index from 1-5 coding human rights violations by governments published in Amnesty International and U.S. State Department reports. I’ve also worked with data based on presidential speeches (although not quantitatively). Anytime you use such data without fully knowing the production process, you may not be able to replicate (or at least understand) the data collection, and therefore you may not want to trust these data.

The authors go on to show how they used expert and massive numbers of non-expert coders (via crowd sourcing) working on textual sources. In giving detailed criteria on how the data were analyzed, the authors demonstrate how a comprehensive replication standard can be implemented for such and any other political data collection process.

I highly recommend reading the full article:

KENNETH BENOIT, DREW CONWAY, BENJAMIN E. LAUDERDALE, MICHAEL LAVER and SLAVA MIKHAYLOV (2016). Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data. American Political Science Review, 110, pp 278-295. doi:10.1017/S0003055416000058.


Related Articles

Emerson College Pollsters Explain How Pollsters Do What They Do
Communication
October 23, 2024

Emerson College Pollsters Explain How Pollsters Do What They Do

Read Now
All Change! 2024 – A Year of Elections: Campaign for Social Science Annual Sage Lecture
Event
October 10, 2024

All Change! 2024 – A Year of Elections: Campaign for Social Science Annual Sage Lecture

Read Now
Exploring the ‘Publish or Perish’ Mentality and its Impact on Research Paper Retractions
Research
October 10, 2024

Exploring the ‘Publish or Perish’ Mentality and its Impact on Research Paper Retractions

Read Now
Lee Miller: Ethics, photography and ethnography
News
September 30, 2024

Lee Miller: Ethics, photography and ethnography

Read Now
‘Settler Colonialism’ and the Promised Land

‘Settler Colonialism’ and the Promised Land

The term ‘settler colonialism’ was coined by an Australian historian in the 1960s to describe the occupation of a territory with a […]

Read Now
Webinar: Banned Books Week 2024

Webinar: Banned Books Week 2024

As book bans and academic censorship escalate across the United States, this free hour-long webinar gathers experts to discuss the impact these […]

Read Now
Research Assessment, Scientometrics, and Qualitative v. Quantitative Measures

Research Assessment, Scientometrics, and Qualitative v. Quantitative Measures

The creation of the Coalition for Advancing Research Assessment (CoARA) has led to a heated debate on the balance between peer review and evaluative metrics in research assessment regimes. Luciana Balboa, Elizabeth Gadd, Eva Mendez, Janne Pölönen, Karen Stroobants, Erzsebet Toth Cithra and the CoARA Steering Board address these arguments and state CoARA’s commitment to finding ways in which peer review and bibliometrics can be used together responsibly.

Read Now
0 0 votes
Article Rating
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments