Visualizing Social Media Analysis
This article originally appeared at MethodSpace, the convening place for the social and behavioral science research methods community.
***
There’s a lot of hope attached to idea that the use of big data-drivers like social media can create important social science. In a small scale test of that big idea, software developers, requirements engineers and social scientists collaborated to see if they could make practical use of Twitter to learn about patient experiences of cystic fibrosis. They detailed their experiences in a SAGE Research Methods case study, “User-Driven Data Capture: Locating and Analysing Twitter Conversation about Cystic Fibrosis without Keywords.”
Here, two of the five co-authors of that case, social scientist Phillip Brooker and computer scientist Timothy Cribbin, jointly describe how “even a 140-character Tweet can tell you a lot about peoples’ experiences with healthcare” and how they were able to harvest that information for their research. (The other researchers were Julie Barnett, Alexandra R. Lang and Jennifer Martin.)
Brooker’s contribution to the study came as a trained sociologist at the University of Bath whose work often centers on ethnomethodologically flavored research. Before coming to Bath he helped developed Chorus, software that allows social scientists to analyze Twitter data. Cribbin, meanwhile, led the development of the algorithms and software used in the research. Although a lecturer in the Department of Computer Science at Brunel University London, his undergraduate studies were in psychology.
Here they talk about the ethical challenges of working in a rapidly changing landscape, why it’s important to be able to visualize what your analysis is finding, and why it’s important not to let your analysis be derived from some sort of ‘black box’ that you as the researcher don’t fully understand.
Could you talk briefly about the intersection of social media technology and healthcare?
Social media holds a lot of opportunities for healthcare in terms of communication, empirical evaluation and possibly even diagnosis. In terms of the former, providing tailored information, widening access and the provision of social support are key areas that need to be exploited. However, our interest lies in leveraging the activity that is already ‘out there,’ to inform key activities like the development of products and the formation of policy. Our paper on user-following shows how to tap into social media data to gain an insight into the discussions that are already ongoing amongst communities (patients and practitioners alike) of interest. Believe it or not, even a 140-character Tweet can tell you a lot about peoples’ experiences with healthcare! And this is exactly what we wanted to demonstrate in our piece, which looks at everyday experiences of cystic fibrosis sufferers.
What was this individual study about? What did you want your final dataset to look like and what did you do to make it visual?
This study was primarily methodological. Though we wanted to demonstrate the user-following approach with real data about a genuine research interest (i.e. healthcare and specifically cystic fibrosis), it was about demonstrating a qualitative way to dig into Twitter data to get at something that holds a very tangible meaning and relevance for the people tweeting. These people were reporting on their struggles and experiences with a chronic condition, which to healthcare researchers and healthcare tech manufacturers might be valuable sources of insight that lead to beneficial developments. So, the interest for me in this study was in demonstrating the potential for social media to produce meaningful results. In this sense, what we wanted to do was clearly show how you could build up a collection of Twitter users timelines into something that resembles (and can be treated as) a “dataset”, and then also go further and show some techniques for visualizing that dataset as a collection of topics and themes which can be navigated around in a pointed way to make deriving these insights easier.
What tools do you use and how did you visualize your findings using those tools?
For this study, we used a tool we developed ourselves called Chorus, which can both collect Twitter data and visualise it in various ways (see www.chorusanalytics.co.uk for further details on the software). So for this project, we drew on Chorus’ bespoke user-following data collection methods, and then used its ‘cluster explorer’ functionality to build topical ‘cluster maps’ that show the key themes and topics within the data corpus (see images below for examples of what these cluster maps look like, both drawn from the cystic fibrosis research project). This process of collecting data and building visual models of it is all part of the “socio-technical assemblage” we try to represent throughout the analysis, so if you want to see how it all emerges throughout the research process, the aim was to account for precisely this throughout the original research case!
Is this only applicable to Twitter?
Chorus as a tool is Twitter-only. However, the principle of user-following data as a different slice of data other than keyword searches stands across lots of different platforms. So certainly we’d be interested to hear about researchers taking these ideas and applying them to other platforms where user-groups that are relevant to various research questions might be identified – Facebook or Reddit for example.
It seems like drawing from social media means your raw data is delivered to your doorstep – but I suspect it’s not quite that simple. What innovative ways did you use to address that?
Social media data lacks structure and is neither neutral nor objective – you have to accept what it represents and then build your data collection and analysis around these caveats. Often any data you collect has gone through a whole array of processing before you even see it. With Twitter data for instance, they filter what you are allowed to collect in lots of different, undocumented ways. Beyond this, searching around a topic, using keywords, is inherently problematic. Firstly, it is very difficult to isolate the terms that define discussions relevant to your topic. Second, even with good keywords, the pool of data collected may be skewed by content that is either not useful (e.g. spam) or by users who are disproportionately vocal on the subject.
For instance, on our cystic fibrosis project, a key thing we had to write about was how the data was constructed – the analysis only makes sense if you can see exactly how we built up our list of users to follow, how we whittled it down to users we thought might have some first-hand experience with cystic fibrosis (as a sufferer or family member of a sufferer and so on), and how we then dug into it with Chorus. So in this way, the whole research process becomes part of the analysis, which is a very different way of writing and thinking about research!
What are the main ethical challenges of undertaking work like this?
The key challenge really is trying to manage standard ethical procedures for research (i.e. things like maintaining anonymity of participants, issues of consent and ownership of data, etc.) alongside the terms and conditions with which corporations like Twitter govern data usage.
All data that Chorus gathers from Twitter’s API is public by default and therefore, legally, it’s ‘fair game’ for analysis – but that’s not to say that we can collect and analyse it without thinking about standard research ethics protocols such as maintaining anonymity and ensuring consent (which includes deleting content that is subsequently deleted by users)! The ever-evolving nature of social media means it’s probably impossible to expect there to ever be a comprehensive set of ethical guidelines to check your research against.
So we would very much encourage researchers in this area just to maintain a ‘savviness’ to the changing nature of social media, and to constantly have it in the back of their minds on a regular and ongoing basis as something that affects the research process from start to finish.
What broader takeaways would be applicable to researchers anywhere? What do you foresee happening in the nexus between future social media and social science research?
Lately there’s been a turn towards more visually oriented social media platforms (e.g. Instagram, Vine, Snapchat), and more generally there seems to be a tendency for the field to flit from one platform to next as “the next big thing”… But a broader takeaway of our research, we hope, is that even on Twitter (which was the first programmatically explored social media platform that engaged social science researchers), there are still a lot of new and innovative things to do. It’s not over yet!
So while it’s definitely a good thing for the field to engage with new platforms as they arise, I’d say it’s also very worthwhile exploring platforms (like Twitter) that we’ve been working with for years and that still have a very active and diverse user-base. More broadly, the future of research within this nexus lies in capitalizing on the relative strengths of both the social science and data science communities. This includes providing means to educate social scientists to exploit things like machine learning and visualization on the one hand, whilst teaching data scientists to understand the research interests and traditional methodologies on the other.
What advice would you offer anyone conducting similar research? What might you do differently?
Our advice to other social media researchers would be to really carefully think through their methods throughout their research, in terms of how any and all decisions that get made along the way affect and shape the resulting analysis. This is something all researchers should do in any field really, but it becomes especially relevant for a field like social media analytics where researchers have to rely on algorithms and tools that might be operating behind the scenes in a “black box.” It’s vital to be aware of what these tools are doing in terms of shaping the phenomena you’re trying to look at!
The other bit of advice is to not get too hung up on ‘traditional’ social science data concerns like “validity” and “reliability” and so on… there has been a lot of work decrying the problems of Twitter data in this regard, but we find it more productive to ask what you can do with data like this than to bemoan it as a kind of flawed alternative to “offline” data sources (like you’d get through interviews or focus groups and so on). For me, the “online/offline” comparison doesn’t make a lot of sense, and it’s best to think about the data on its own terms – hopefully the work we’ve done here might help you to see how that can be done!