Business and Management INK

Machine Learning Research Requires Smaller Sample Sizes than Previously Thought

August 13, 2024 238

In this post, authors Louis Hickman, Josh Liff, Caleb Rottman, and Charles Calderwood outline the inspiration behind their recently published academic paper, “The Effects of the Training Sample Size, Ground Truth Reliability, and NLP Method on Language-Based Automatic Interview Scores’ Psychometric Properties,” found in Organizational Research Methods.

Natural language processing (NLP) and supervised machine learning (ML) are increasingly used to score latent constructs from natural language text in both research and practice (e.g., personnel selection). However, many practitioners, researchers, and reviewers assume that ML requires thousands of observations. Our research aimed to empirically address this assumption, given that editors and reviewers have previously pressed us to obtain larger sample sizes.

Thus, we designed an experiment to understand how training sample size, NLP method, and reliability of the target variable affect the convergent validity and test-retest reliability of ML scores in the context of automatically scored interviews. We were surprised to find that for most target variables, neither validity nor reliability increased much when the ML training data included at least 500 observations. Researchers can pursue projects on NLP and ML with just hundreds of observations, thereby significantly lowering the barrier to entry in this emerging area of research.

Researchers and practitioners are already using large language models (LLMs), such as those that power ChatGPT, to score psychological constructs from text with little (few-shot) to no (zero-shot) additional training data. Many organizations will continue to opt for traditional, supervised ML models for scoring open-ended content because they control the model and its training data and can adjust the model to improve fairness. On the other hand, some LLMs have not shared their training data, no known methods can reduce bias in their outputs, and most organizations do not have the resources ($$$) to train an LLM from scratch.

Two charts that depict the observed convergent correlations for interview-related constructs scored at interview-level. The chart on the left shows a visualization for the NLP Method: RoBERTa and binary unigrams, and the chart on the right shows a visualization for the NLP MEthod: RoBERTa (ridge). Both charts were created by Louis Hickman.

LLMs have lowered the barrier for getting started in this area of research, as they can help researchers and practitioners write and learn how to code. Combining NLP with ML is a powerful way to capitalize on the voluminous amounts of unstructured data relevant to organizations. Additional guidance and example code for getting started is available in the following publications.

Hickman, L., Herde, C. N., Lievens, F., & Tay, L. (2023). Automatic Scoring of Speeded Interpersonal Assessment Center Exercises Via Machine Learning: Initial Psychometric Evidence and Practical Guidelines. International Journal of Selection and Assessment, 31(2), 225-239. https://doi.org/10.1111/ijsa.12418

Hickman, L., Saef, R., Ng, V., Tay, L., Woo, S. E., & Bosch, N. (2021). Developing and evaluating language-based machine learning algorithms for inferring applicant personality in video interviews. Human Resource Management Journal, advance online publication. https://doi.org/10.1111/1748-8583.12356

Hickman, L., Thapa, S., Tay, L., Cao, M., & Srinivasan, P. (2022). Text preprocessing for text mining in organizational research: Review and recommendations. Organizational Research Methods, 25(1), 114-146. https://doi.org/10.1177/1094428120971683

Rottman, C., Gardner, C., Liff, J., Mondragon, N., & Zuloaga, L. (2023). New strategies for addressing the diversity–validity dilemma with big data. Journal of Applied Psychology, 108(9), 1425–1444. https://doi.org/10.1037/apl0001084

Zhang, N. Wang, M., Xu, H., Koenig, N., Hickman, L., Kuruzovich, J., Ng, V., Arhin, K., Wilson, D., Song, Q. C., Tang, C., Alexander, L., & Kim, Y. (2023). Reducing Subgroup Differences in Personnel Selection through the Application of Machine Learning. Personnel Psychology, 76(4), 1125-1159. https://doi.org/10.1111/peps.12593

Louis Hickman (pictured) is an assistant professor of industrial-organizational psychology at Virginia Tech. He received his PhD from Purdue University and has research interests in applications of artificial intelligence, machine learning, and natural language processing in organizations, including addressing concerns about algorithmic bias and how these technologies influence workplace behavior. Josh Liff (PhD) is the director of research & development at HireVue, where he leads the organization's industrial-organizational psychology research team to guide the development and validation of next-generation assessment solutions that blend the state-of-the-art in industrial-organizational psychology with data science. Caleb Rottman (PhD) is the senior data scientist and technical lead at HireVue and has over 10 years of research experience in algorithms, statistics, data science, and applied mathematics. Charles Calderwood is an associate professor of industrial-organizational psychology at Virginia Tech. He received his PhD in psychology from the Georgia Institute of Technology and has research and methodological interests in occupational health, commuting, and dynamic multilevel statistical techniques.

View all posts by Louis Hickman, Josh Liff, Caleb Rottman, and Charles Calderwood

Related Articles

Does CEO Morality Matter for Their Firms’ ESG Performance?
Business and Management INK
August 12, 2024

Does CEO Morality Matter for Their Firms’ ESG Performance?

Read Now
Why Are Union Members More Willing to Strike and Protest than Non-Union Members? Evidence from Argentina and Chile
Business and Management INK
August 8, 2024

Why Are Union Members More Willing to Strike and Protest than Non-Union Members? Evidence from Argentina and Chile

Read Now
The Work of Understanding The ‘Future of Work’
Business and Management INK
August 5, 2024

The Work of Understanding The ‘Future of Work’

Read Now
Turning Real-Life Drama into Scholarly Insight: Lessons from the Wild Boars Rescue
Business and Management INK
August 2, 2024

Turning Real-Life Drama into Scholarly Insight: Lessons from the Wild Boars Rescue

Read Now
Using Advanced Machine Learning to Better Understand the Emotional Intensity of Online Reviews

Using Advanced Machine Learning to Better Understand the Emotional Intensity of Online Reviews

In this article, Sanghyub John Lee and Rouxelle de Villiers reflect on the inspiration of the research article, “Unveiling Emotional Intensity in Online Reviews: […]

Read Now
Navigating the Maze of Social Interaction: A Framework for Analysis

Navigating the Maze of Social Interaction: A Framework for Analysis

In the labyrinth of academic exploration, write the authors, there are moments when frustration becomes the catalyst for innovation. Such was the genesis of their paper analyzing social interaction in organizations.

Read Now
Exploring LGBTQ+ Marketing Scholarship and Strategies

Exploring LGBTQ+ Marketing Scholarship and Strategies

In this article, Michael Mehmet, Clifford Lewis, and Nina Reynolds reflect on the inspiration behind their research article, “A Narrative Review of […]

Read Now
0 0 votes
Article Rating
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments