Watch Gary King: Do We Need a Big Data Treaty?
For years political scientist Gary King – director of the Institute for Quantitative Social Science at Harvard University – has preached to his peers for a restructuring of the social sciences that would include “larger scale, collaborative, interdisciplinary, lab-style research teams” with big data analysis in their DNA.
Increasingly, King is making the similar arguments to others outside of academe but involved in accessing big data – business and government. The differing interests of each partner in the triad requires a modus vivendi that currently does not exist.
At a Capitol Hill event in May — video below — sponsored by SAGE Publishing (the parent of Social Science Space), King called on the policymakers and government officials in the audience to consider enacting a “treaty” on the collection, retention, access and sharing of big data that could serve the needs of the academic world, the commercial world and government while protecting the interests of the public.
“All three partners would be way better off,” King insists.
“If we could sign this ‘treaty,’” he says, companies could gain access to the data from government and from their own activities without fear of “the random terror of government coming in and saying, ‘You know what? After two years, just delete all your data.’” They just want it to be regulated in some way, he adds, that offers the protection of predictability.
Researchers, too, want access to that commercial and governmental data. “We would all live longer, healthier, happier lives and most of the issues that members of Congress care about are actually the issues that social scientists want to study,” King says, if that data were routinely available. Plus, by bringing in more eyes, research would improve. “The key reasons social sciences are moving from studying problems individually … to the scientific model where we’re actually solving problems, is because of the community. It is much easier to fool ourselves than it is to fool our community.”
The treaty would recognize the key point that King has been making about big data for years. He often gives talks with names like “Big Data is not about the data.” In his Big Deal talk, however, King hammered home the need to understand that “the value is not the data. It’s not the big, It’s the analytics.”
Data is easy to come by, he insists, and is in fact a by-product of improvements in information technology. Most Fortune 500 firms have now, he notes, become data consumers, producers and analyzers (“Facebook,” he says, “is basically a social science innovation.”) Even if you choose to ignore this now commoditized flow, by the end of the year you’ll still have more than you started the year with.
“What are you going to so with all that data? It’s not that helpful, by itself, because you have to manage it. It’s valuable, so you have to keep it. … The value is the analytics, the revolution is the analytics. The revolution, that thing that we did not know how to do before, but that we are learning how to do now, is how to make the data actionable.”
But it’s safe to say that the focus on analysis as opposed to gathering has not captured the public imagination to the same degree as has the rubric ‘big data,’ which has such a talismanic hold that it’s usually capitalized – Big Data – as if it were a living entity. Quantitative social science, for which big data could be seen as essentially a proxy, has not made similar inroads.
And King no bones about the utility of the term “big data.” He says the term helps the public “get” the revolution in commoditized data and the computational efforts involved in extracting value from that data. “My mom,” he says, “now thinks she understands what I do.” The very popularity of the term “big data,” which while it has an increasingly mixed reception currently among actual data and computational social scientists, is opening doors among the public and policymakers.
To demonstrate this point he cites Moore’s Law, which predicts (successfully so far) that computer speed and power will double every 18 months. “That’s nothing,” King enthuses, compared to a competent grad student beavering away for an afternoon, who can create a thousand-fold increase by crafting algorithms to plow through these avalanches of data.
King told the story of a Harvard colleague who every year faced increasingly monstrous piles of data. One year the data exceeded what his computer could hold. The academic asked the university IT shop to “spec out a new computer,” and the proposed bill for that cyber behemoth came back for $2 million. King and a student “intercepted” this exchange and worked on crafting an algorithm “for almost two hours.”
The result? The initial colleague can now run his mountain of data on his laptop — and see results in 20 minutes or so.
“The most amazing thing about this story?” King asks. “It’s that it’s not that amazing. It happens all the time. The innovation is the analytics.” Even “off the shelf” analytics provide a huge improvement generating usable information compared to none, says King, but the astronomical leap comes from crafting custom analytical solutions – hardly a surprising statement from the head of a computational lab. To show the truth of his assertion, in the video below King outlines some of the successful attempts to unravel nettlesome issues that followed from when his lab applied custom analytics to existing piles of data. These real-world examples demonstrate the value of signing on to this treaty.