Analyzing Culture with Google Books: Is It Social Science?
In a recent opinion piece in Miller-McCune Magazine, Anita Guerrini argues that discovering fun facts by graphing terms found among the 5 million volumes of the Google Books project sure is amusing — but this pursuit dubbed ‘culturomics’ is not the same as being an historian.
Earlier this year, a group of scientists — mostly in mathematics and evolutionary psychology — published an article in Science titled“Quantitative Analysis of Culture Using Millions of Digitized Books.”The authors’ technique, called “culturomics,” would, they said, “extend the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.” The authors employed a “corpus” of more than 5 million books — 500 billion words — that have been scanned by Google as part of the Google Books project. These books, the authors assert, represent about 4 percent of all the books ever published, and will allow the kind of statistically significant analysis common to many sciences.
This sounds impressive. The authors point out that 500 billion words are more than any human could reasonably read in a lifetime. Their main method of analysis is to count the number of times a particular word or phrase (referred to as an n-gram) occurs over time in this corpus. (Try your own hand at n-grams here.) Their full data set includes over 2 billion such “culturomic trajectories.” One of the examples the authors give is to trace the usage of the year “1951.” They note that “1951” was not discussed much before the actual year 1951, that it appeared a lot in 1951, and that its usage dropped off after 1951. They call this evidence of collective memory.
I initially reacted to this article with skepticism. As I read more — including a recent piece (one might call it a puff piece) in Nature on one of the co-authors, Erez Lieberman Aiden, in which he was dubbed “the prophet of digital humanities” — my skepticism became stronger. I think culturomics is a nifty tool, but we need to be cautious and critical about this kind of digital data and about claims that culturomics could make “much of what [historians] do trivially easy.” Historians do much more than follow trajectories, so I am not so sure that culturomics will lead to a new way of doing historical work. It’s not the game-changer it’s been claimed to be.
I would not call myself a Luddite — I use digital resources all the time, in my research and my teaching. I have hundreds of PDFs of books I have downloaded from a variety of online sources — Early English Books Online,Eighteenth Century Collections Online, Gallica (the digital service of the French National Library), and yes, Google Books — that I use in my research.
But when I read the Science article, I was immediately struck by what seems to me to be a fundamental flaw in its methodology: its reliance on Google Books for its sample….
Read the rest Here