Business and Management INK

Machine Learning Research Requires Smaller Sample Sizes than Previously Thought

By Louis Hickman, Josh Liff, Caleb Rottman, and Charles Calderwood

August 13, 2024 2358

In this post, authors Louis Hickman, Josh Liff, Caleb Rottman, and Charles Calderwood outline the inspiration behind their recently published academic paper, “The Effects of the Training Sample Size, Ground Truth Reliability, and NLP Method on Language-Based Automatic Interview Scores’ Psychometric Properties,” found in Organizational Research Methods.

Natural language processing (NLP) and supervised machine learning (ML) are increasingly used to score latent constructs from natural language text in both research and practice (e.g., personnel selection). However, many practitioners, researchers, and reviewers assume that ML requires thousands of observations. Our research aimed to empirically address this assumption, given that editors and reviewers have previously pressed us to obtain larger sample sizes.

Thus, we designed an experiment to understand how training sample size, NLP method, and reliability of the target variable affect the convergent validity and test-retest reliability of ML scores in the context of automatically scored interviews. We were surprised to find that for most target variables, neither validity nor reliability increased much when the ML training data included at least 500 observations. Researchers can pursue projects on NLP and ML with just hundreds of observations, thereby significantly lowering the barrier to entry in this emerging area of research.

Researchers and practitioners are already using large language models (LLMs), such as those that power ChatGPT, to score psychological constructs from text with little (few-shot) to no (zero-shot) additional training data. Many organizations will continue to opt for traditional, supervised ML models for scoring open-ended content because they control the model and its training data and can adjust the model to improve fairness. On the other hand, some LLMs have not shared their training data, no known methods can reduce bias in their outputs, and most organizations do not have the resources ($$$) to train an LLM from scratch.

Two charts that depict the observed convergent correlations for interview-related constructs scored at interview-level. The chart on the left shows a visualization for the NLP Method: RoBERTa and binary unigrams, and the chart on the right shows a visualization for the NLP MEthod: RoBERTa (ridge). Both charts were created by Louis Hickman.

LLMs have lowered the barrier for getting started in this area of research, as they can help researchers and practitioners write and learn how to code. Combining NLP with ML is a powerful way to capitalize on the voluminous amounts of unstructured data relevant to organizations. Additional guidance and example code for getting started is available in the following publications.

Hickman, L., Herde, C. N., Lievens, F., & Tay, L. (2023). Automatic Scoring of Speeded Interpersonal Assessment Center Exercises Via Machine Learning: Initial Psychometric Evidence and Practical Guidelines. International Journal of Selection and Assessment, 31(2), 225-239. https://doi.org/10.1111/ijsa.12418

Hickman, L., Saef, R., Ng, V., Tay, L., Woo, S. E., & Bosch, N. (2021). Developing and evaluating language-based machine learning algorithms for inferring applicant personality in video interviews. Human Resource Management Journal, advance online publication. https://doi.org/10.1111/1748-8583.12356

Hickman, L., Thapa, S., Tay, L., Cao, M., & Srinivasan, P. (2022). Text preprocessing for text mining in organizational research: Review and recommendations. Organizational Research Methods, 25(1), 114-146. https://doi.org/10.1177/1094428120971683

Rottman, C., Gardner, C., Liff, J., Mondragon, N., & Zuloaga, L. (2023). New strategies for addressing the diversity–validity dilemma with big data. Journal of Applied Psychology, 108(9), 1425–1444. https://doi.org/10.1037/apl0001084

Zhang, N. Wang, M., Xu, H., Koenig, N., Hickman, L., Kuruzovich, J., Ng, V., Arhin, K., Wilson, D., Song, Q. C., Tang, C., Alexander, L., & Kim, Y. (2023). Reducing Subgroup Differences in Personnel Selection through the Application of Machine Learning. Personnel Psychology, 76(4), 1125-1159. https://doi.org/10.1111/peps.12593

Louis Hickman, Josh Liff, Caleb Rottman, and Charles Calderwood

Louis Hickman (pictured) is an assistant professor of industrial-organizational psychology at Virginia Tech. He received his PhD from Purdue University and has research interests in applications of artificial intelligence, machine learning, and natural language processing in organizations, including addressing concerns about algorithmic bias and how these technologies influence workplace behavior. Josh Liff (PhD) is the director of research & development at HireVue, where he leads the organization's industrial-organizational psychology research team to guide the development and validation of next-generation assessment solutions that blend the state-of-the-art in industrial-organizational psychology with data science. Caleb Rottman (PhD) is the senior data scientist and technical lead at HireVue and has over 10 years of research experience in algorithms, statistics, data science, and applied mathematics. Charles Calderwood is an associate professor of industrial-organizational psychology at Virginia Tech. He received his PhD in psychology from the Georgia Institute of Technology and has research and methodological interests in occupational health, commuting, and dynamic multilevel statistical techniques.

View all posts by Louis Hickman, Josh Liff, Caleb Rottman, and Charles Calderwood

Published

August 13, 2024

Closing the Gender Pay Gap: Why Intermediaries Matter

By Sally Curtis, Jananie William, Anna von Reibnitz, Miriam Glennie, and Andreas Pekarek

Read Now

From Isolation to Impact: Tackling the Emotional Toll of Ethnographic Research in Business and Society

Business and Management INK

April 22, 2025

From Isolation to Impact: Tackling the Emotional Toll of Ethnographic Research in Business and Society

By Lorenzo Skade

Read Now

“Everything Not Saved Will Be Lost.” –Nintendo “Quit Screen” Message

Business and Management INK

April 8, 2025

“Everything Not Saved Will Be Lost.” –Nintendo “Quit Screen” Message

By Richard F.J. Haans and Marc J. Mertens

Read Now

Author Reflections on Intraorganizational Developmental Networks

Business and Management INK

April 2, 2025

Author Reflections on Intraorganizational Developmental Networks

By Andrew Dhaenens

Read Now

Changing the World or Changing Ourselves?

Catherine Brentnall and David Higgins 6711 Business and Management INK

In this blog post, co-authors Catherine Brentnall and David Higgins reflect on their interest in how educators change themselves and their practice […]

Read Now

Generative AI Literacy: A Proposed Way Forward

Stefanie Beninger, Alex Reppel, Julie Stanton and Forrest Watson 5187 Business and Management INK

In this article, co-authors Stefanie Beninger, Alex Reppel, Julie Stanton and Forrest Watson reflect on the inspiration behind their research article, “Facilitating Generative AI […]

Read Now

Exploring the Psychosocial Correlation of Skin Lightening Products

Hanan Afzal, Sameer Deshpande, and Joan Carlini 5726 Business and Management INK

In this article, co-authors Hanan Afzal, Sameer Deshpande, and Joan Carlini reflect on the history and inspiration behind their new research article, “Glowing Beyond Shades: […]

Read Now