Automated Scoring of Creative Achievement

Noah Meinzer, Janika Saretzki, Simon M. Ceh, Mathias Benedek

Published online on April 08, 2026

Abstract

["The Journal of Creative Behavior, Volume 60, Issue 2, June 2026. ", "\nABSTRACT\nThe assessment of creative achievement (CA) can be cumbersome as participants are typically asked to respond to long lists of possible accomplishments that may still miss their very specific achievements. A bottom‐up alternative is to let participants openly report their most significant CAs, which, however, involves more complex scoring such as via human ratings. In this study, we investigated whether language models (LMs) can provide an efficient and valid scoring of such open‐ended responses. Across two data sets, participants described their three most significant CAs. These responses were rated by human judges and by three LMs (Llama 3.1–8B, Llama 3.3–70B, GPT‐4o) using zero‐shot prompting. Correlations between human and LM ratings were consistently high (r = 0.53–0.80), and criterion validity evidence of LM‐based scores was largely on par with rater‐based scores. In addition, we examined zero‐shot domain classification of CAs into nine creative domains (e.g., music, visual arts). Classification accuracy was 62.3% overall; closer inspection suggested that automated classification has the potential to unveil conceptual overlaps between domains and to identify CAs involving multiple domains. Taken together, automated scoring of CA via LMs represents a promising and efficient alternative to traditional CA measures by approximating human ratings and providing useful domain classifications.\n"]