MetaTOC stay on top of your field, easily

The Long‐Term Sustainability of IRT Scaling Methods in Mixed‐Format Tests

,

Journal of Educational Measurement

Published online on

Abstract

Due to recent research in equating methodologies indicating that some methods may be more susceptible to the accumulation of equating error over multiple administrations, the sustainability of several item response theory methods of equating over time was investigated. In particular, the paper is focused on two equating methodologies: fixed common item parameter scaling (with two variations, FCIP‐1 and FCIP‐2) and the Stocking and Lord characteristic curve scaling technique in the presence of nonequivalent groups. Results indicated that the improvements made to fixed common item parameter scaling in the FCIP‐2 method were sustained over time. FCIP‐2 and Stocking and Lord characteristic curve scaling performed similarly in many instances and produced more accurate results than FCIP‐1. The relative performance of FCIP‐2 and Stocking and Lord characteristic curve scaling depended on the nature of the change in the ability distribution: Stocking and Lord characteristic curve scaling captured the change in the distribution more accurately than FCIP‐2 when the change was different across the ability distribution; FCIP‐2 captured the changes more accurately when the change was consistent across the ability distribution.