Business and Management INK

Machine Learning Research Requires Smaller Sample Sizes than Previously Thought

August 13, 2024 1464

In this post, authors Louis Hickman, Josh Liff, Caleb Rottman, and Charles Calderwood outline the inspiration behind their recently published academic paper, “The Effects of the Training Sample Size, Ground Truth Reliability, and NLP Method on Language-Based Automatic Interview Scores’ Psychometric Properties,” found in Organizational Research Methods.

Natural language processing (NLP) and supervised machine learning (ML) are increasingly used to score latent constructs from natural language text in both research and practice (e.g., personnel selection). However, many practitioners, researchers, and reviewers assume that ML requires thousands of observations. Our research aimed to empirically address this assumption, given that editors and reviewers have previously pressed us to obtain larger sample sizes.

Thus, we designed an experiment to understand how training sample size, NLP method, and reliability of the target variable affect the convergent validity and test-retest reliability of ML scores in the context of automatically scored interviews. We were surprised to find that for most target variables, neither validity nor reliability increased much when the ML training data included at least 500 observations. Researchers can pursue projects on NLP and ML with just hundreds of observations, thereby significantly lowering the barrier to entry in this emerging area of research.

Researchers and practitioners are already using large language models (LLMs), such as those that power ChatGPT, to score psychological constructs from text with little (few-shot) to no (zero-shot) additional training data. Many organizations will continue to opt for traditional, supervised ML models for scoring open-ended content because they control the model and its training data and can adjust the model to improve fairness. On the other hand, some LLMs have not shared their training data, no known methods can reduce bias in their outputs, and most organizations do not have the resources ($$$) to train an LLM from scratch.

Two charts that depict the observed convergent correlations for interview-related constructs scored at interview-level. The chart on the left shows a visualization for the NLP Method: RoBERTa and binary unigrams, and the chart on the right shows a visualization for the NLP MEthod: RoBERTa (ridge). Both charts were created by Louis Hickman.

LLMs have lowered the barrier for getting started in this area of research, as they can help researchers and practitioners write and learn how to code. Combining NLP with ML is a powerful way to capitalize on the voluminous amounts of unstructured data relevant to organizations. Additional guidance and example code for getting started is available in the following publications.

Hickman, L., Herde, C. N., Lievens, F., & Tay, L. (2023). Automatic Scoring of Speeded Interpersonal Assessment Center Exercises Via Machine Learning: Initial Psychometric Evidence and Practical Guidelines. International Journal of Selection and Assessment, 31(2), 225-239. https://doi.org/10.1111/ijsa.12418

Hickman, L., Saef, R., Ng, V., Tay, L., Woo, S. E., & Bosch, N. (2021). Developing and evaluating language-based machine learning algorithms for inferring applicant personality in video interviews. Human Resource Management Journal, advance online publication. https://doi.org/10.1111/1748-8583.12356

Hickman, L., Thapa, S., Tay, L., Cao, M., & Srinivasan, P. (2022). Text preprocessing for text mining in organizational research: Review and recommendations. Organizational Research Methods, 25(1), 114-146. https://doi.org/10.1177/1094428120971683

Rottman, C., Gardner, C., Liff, J., Mondragon, N., & Zuloaga, L. (2023). New strategies for addressing the diversity–validity dilemma with big data. Journal of Applied Psychology, 108(9), 1425–1444. https://doi.org/10.1037/apl0001084

Zhang, N. Wang, M., Xu, H., Koenig, N., Hickman, L., Kuruzovich, J., Ng, V., Arhin, K., Wilson, D., Song, Q. C., Tang, C., Alexander, L., & Kim, Y. (2023). Reducing Subgroup Differences in Personnel Selection through the Application of Machine Learning. Personnel Psychology, 76(4), 1125-1159. https://doi.org/10.1111/peps.12593

Louis Hickman (pictured) is an assistant professor of industrial-organizational psychology at Virginia Tech. He received his PhD from Purdue University and has research interests in applications of artificial intelligence, machine learning, and natural language processing in organizations, including addressing concerns about algorithmic bias and how these technologies influence workplace behavior. Josh Liff (PhD) is the director of research & development at HireVue, where he leads the organization's industrial-organizational psychology research team to guide the development and validation of next-generation assessment solutions that blend the state-of-the-art in industrial-organizational psychology with data science. Caleb Rottman (PhD) is the senior data scientist and technical lead at HireVue and has over 10 years of research experience in algorithms, statistics, data science, and applied mathematics. Charles Calderwood is an associate professor of industrial-organizational psychology at Virginia Tech. He received his PhD in psychology from the Georgia Institute of Technology and has research and methodological interests in occupational health, commuting, and dynamic multilevel statistical techniques.

View all posts by Louis Hickman, Josh Liff, Caleb Rottman, and Charles Calderwood

Related Articles

Navigating CSR Communication in an Age of Polarization
Business and Management INK
December 18, 2024

Navigating CSR Communication in an Age of Polarization

Read Now
What European SMEs Can Teach Us About Innovation and Informal Human Resource Management
Business and Management INK
December 16, 2024

What European SMEs Can Teach Us About Innovation and Informal Human Resource Management

Read Now
When Do You Need to Trust a GenAI’s Input to Your Innovation Process?
Business and Management INK
December 13, 2024

When Do You Need to Trust a GenAI’s Input to Your Innovation Process?

Read Now
Using Intelligent Self-Limitation to Explore the Distinction Between Environment and Umwelt
Business and Management INK
December 6, 2024

Using Intelligent Self-Limitation to Explore the Distinction Between Environment and Umwelt

Read Now
The Authors of ‘Artificial Intelligence and Work’ on Future Risk

The Authors of ‘Artificial Intelligence and Work’ on Future Risk

During the final stages of editing the proofs for Artificial Intelligence and Work: Transforming Work, Organizations, and Society in an Age of Insecurity, […]

Read Now
From Conflict to Peace: Reflecting on the Leadership of John Hume in Northern Ireland

From Conflict to Peace: Reflecting on the Leadership of John Hume in Northern Ireland

In this post, author Joanne Murphy reflects on the life and legacy of John Hume, the topic of her article, “Leadership, liminality, […]

Read Now
The End of Meaningful CSR?

The End of Meaningful CSR?

In this article, co-authors W. Lance Bennet and Julie Uldam reflect on the inspiration behind their research article, “Corporate Social Responsibility in […]

Read Now
0 0 votes
Article Rating
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments