Leveraging Multimodal Large Language Models to Analyse Student Exploration Behaviours in Educational Game Environments
Journal of Computer Assisted Learning
Published online on June 03, 2026
Abstract
["Journal of Computer Assisted Learning, Volume 42, Issue 4, August 2026. ", "\nABSTRACT\n\nBackground\nVideo data provides rich opportunities to examine student behaviour in game‐based learning environments, capturing not only observable actions but also subtle indicators of cognitive engagement. However, traditional video analysis is labor‐intensive, and current applications of AI to this task are limited by models trained on general‐purpose datasets, which often fail to capture the pedagogical meaning of student actions in authentic educational contexts.\n\n\nObjectives\nThis study explores how multimodal large language models (MLLMs), specifically LLaVA‐Video‐7B‐Qwen2, can support qualitative video analysis of student exploration behaviours in a Minecraft‐based STEM learning environment.\n\n\nMethods\nWe conducted an exploratory case study using screen recordings from a Minecraft‐based STEM learning environment. We tested multiple prompt strategies for guiding MLLM‐generated video descriptions and found that role‐assignment prompting performed most effectively. We evaluated model outputs using a mixed‐method framework that included quantitative scoring, GPT‐based judgement, and researcher validation.\n\n\nResults and Conclusions\nOur findings show that MLLMs can reliably identify surface‐level behaviours, such as navigation patterns and object interactions, but struggle to infer the intent or goals behind student actions, leading to a significant rate of over‐interpretation (26.5% when explaining student strategies). The model's outputs are sensitive to prompt phrasing, underscoring the importance of prompt engineering. While current MLLMs show promise for streamlining parts of the video analysis workflow, their use in educational contexts requires structured oversight and careful interpretation to ensure reliability and relevance.\n\n"]