MetaTOC stay on top of your field, easily

Task and rater effects in L2 speaking and writing: A synthesis of generalizability studies

,

Language Testing

Published online on

Abstract

We addressed Deville and Chalhoub-Deville’s (2006), Schoonen’s (2012), and Xi and Mollaun’s (2006) call for research into the contextual features that are considered related to person-by-task interactions in the framework of generalizability theory in two ways. First, we quantitatively synthesized the generalizability studies to determine the percentage of variation in L2 speaking and L2 writing performance that was accounted for by tasks, raters, and their interaction. Second, we examined the relationships between person-by-task interactions and moderator variables. We used 28 datasets from 21 studies for L2 speaking, and 22 datasets from 17 studies for L2 writing. Across modalities, most of the score variation was explained by examinees’ performance; the interaction effects of tasks or raters were greater than the independent effects of tasks or raters. Task and task-related interaction effects explained a greater percentage of the score variances, than did the rater and rater-related interaction effects. The variances associated with the person-by-task interactions were larger for assessments based on both general and academic contexts, than for those based only on academic contexts. Further, large person-by-task interactions were related to analytic scoring and scoring criteria with task-specific language features. These findings derived from L2 speaking studies indicate that contexts, scoring methods, and scoring criteria might lead to varied performance over tasks. Consequently, this particularly requires us to define constructs carefully.