Our Open-Source Tool Allows AI-Assisted Qualitative Research at Scale
Social science relies on two distinct approaches: quantitative research, which analyses large sets of numerical data, and qualitative research, which explores individual experiences and perspectives in depth.
Interviews, the traditional method of qualitative research, require significant human and financial resources and are limited in terms of the number of participants that can be reached. However, large language models (LLMs) offer an opportunity to conduct interviews at scale, thereby creating a bridge between the richness of qualitative data and the statistical power of quantitative data. Specifically, the LLM can act as an interviewer and interact with human respondents recruited on a standard online platform, such as Prolific. This makes it possible to conduct interviews with thousands of respondents in a matter of hours.
We have developed and launched an easy-to-use platform for conducting large-scale qualitative interviews, based on artificial intelligence in just this way. A chat interface allows the respondent to interact with a LLM that collects their responses and generates new questions. The platform is adaptable to researchers’ needs and capable of handling a wide range of topics while maintaining a general structure that ensures the reliability of the interviews.
The conversational agent’s prompt incorporates best practices from sociological literature, such as the importance of conducting interviews in a non-directive manner, encouraging participants to freely express their views, while posing follow-up questions to ensure clarity on each point raised by the respondent. For instance, it gathers “palpable evidence” to deeply understand topics that respondents may sometimes express in abstract terms. Moreover, the conversational agent demonstrates “cognitive empathy,” using follow-up questions to try to understand the respondent’s perspective as closely as they understand it themselves, insofar as doing so is useful to deepen its understanding of the main theme of the interview.
We analyzed its performance against traditional qualitative interview practices, both from the perspective of interviewees and qualitative research experts. First, we asked a team of sociology PhD students from Harvard and the London School of Economics, who specialize in qualitative methods, to assess the quality of interviews based on the interview scripts. The AI-led interviews were rated approximately comparable to an average human expert (under the same conditions). Additionally, during the interviews, respondents were asked to evaluate the quality of the interaction. A vast majority of participants reported enjoying their interaction with the conversational agent and preferred this mode of interview over open text fields. They felt that it captured their thoughts well, tended to write significantly more words compared to open text fields, and found this interview method to be non-judgmental.
We also explored the capacity of artificial intelligence to analyze different kinds of qualitative questions by using it to study three different subjects areas: subjective issues, “meaning in life,” sensitive issues like political opinions, and complex areas such as decision-making for occupational and educational choices.
In the first instance, we aimed to identify key activities that serve as major sources of meaning in life (defined here as a subjective sense that one’s life makes sense, has purpose, and matters to others) and the states individuals associate with meaning in life, with particular interest in differences across socio-demographic groups. AI-led interviews revealed more information than conventional open text fields, with a 142 percent increase in the number of words respondents wrote. They helped uncover significant activities and subjective states that were not easily anticipated, such as the importance of pet care and companionship, which seemed to emerge as a source of meaning in life approximately as frequently as spirituality and religion. The large sample of respondents enabled the documentation of variations between different socio-economic and political groups. For example, Trump voters were more likely to mention religion as a source of meaning compared to Biden voters.
The second application examined political opinions and the reasons behind electoral choices. Here, participants preferred interacting with a conversational agent perceived as a non-judgmental entity, rather than a human interviewer, allowing them to feel more comfortable and to freely express their views, especially on sensitive political issues. Focusing on the French legislative elections, interviews conducted a week before the elections, they show the adaptability of the conversational agent to languages other than English. The study also revealed interesting insights about political polarization. Depending on their political affiliation, voters had different views on political priorities and issues. They are attracted to specific policies as solutions to issues they consider important, rather than being guided by the general ideology of their preferred party.
In the third application, we explored key factors in decision-making using the example of educational and occupational choices, particularly in STEM (science, technology, engineering and mathematics) fields and innovation careers, with a focus on understanding why certain groups are underrepresented. Consistent with previous research on this topic, financial and sociological factors (such as family, mentors, and childhood interests) were crucial, with a major role played by individual interest and passion.
Finally, ethical concerns must be mentioned. To participate in the interviews, respondents must agree to their data being seen by the firms that own the language models, which may raise privacy concerns for some participants. As smaller, local language models become increasingly capable, it is likely that high-quality interviews can be conducted with them, mitigating potential privacy issues.
This approach due to its flexibility and open-source platform, can offer researchers a new tool to conduct large-scale, rigorous interviews on a variety of topics, while allowing researchers to maintain full control over the prompt and interface. The general instructions provided to the LLM are perceived differently in terms of quality, depending on individual researchers’ preferences. The platform’s flexibility allows researchers from different schools of thought to tailor the interviews to their preferences, for example, by focusing on narratives rather than concrete examples. We propose several avenues for improvement and consider the implications of making the conversation, currently conducted via text chat, more fluid by adding a voice option.