MetaTOC stay on top of your field, easily

A Ranking Method for Evaluating Constructed Responses

Educational and Psychological Measurement

Published online on

Abstract

This article presents a comparative judgment approach for holistically scored constructed response tasks. In this approach, the grader rank orders (rather than rate) the quality of a small set of responses. A prior automated evaluation of responses guides both set formation and scaling of rankings. Sets are formed to have similar prior scores and subsequent rankings by graders serve to update the prior scores of responses. Final response scores are determined by weighting the prior and ranking information. This approach allows for scaling comparative judgments on the basis of a single ranking, eliminates rater effects in scoring, and offers a conceptual framework for combining human and automated evaluation of constructed response tasks. To evaluate this approach, groups of graders evaluated responses to two tasks using either the ranking (with sets of 5 responses) or traditional rating approach. Results varied by task and the relative weighting of prior versus ranking information, but in general the ranking scores showed comparable generalizability (reliability) and validity coefficients.