MetaTOC stay on top of your field, easily

Preference pulses and the win–stay, fix‐and‐sample model of choice

, ,

Journal of the Experimental Analysis of Behavior

Published online on

Abstract

Two groups of six rats each were trained to respond to two levers for a food reinforcer. One group was trained on concurrent variable‐ratio 20 extinction schedules of reinforcement. The second group was trained on a concurrent variable‐interval 27‐s extinction schedule. In both groups, lever‐schedule assignments changed randomly following reinforcement; a light cued the lever providing the next reinforcer. In the next condition, the light cue was removed and reinforcer assignment strictly alternated between levers. The next two conditions redetermined, in order, the first two conditions. Preference pulses, defined as a tendency for relative response rate to decline to the just‐reinforced alternative with time since reinforcement, only appeared during the extinction schedule. Although the pulse's functional form was well described by a reinforcer‐induction equation, there was a large residual between actual data and a pulse‐as‐artifact simulation (McLean, Grace, Pitts, & Hughes, 2014) used to discern reinforcer‐dependent contributions to pulsing. However, if that simulation was modified to include a win–stay tendency (a propensity to stay on the just‐reinforced alternative), the residual was greatly reduced. Additional modifications of the parameter values of the pulse‐as‐artifact simulation enabled it to accommodate the present results as well as those it originally accommodated. In its revised form, this simulation was used to create a model that describes response runs to the preferred alternative as terminating probabilistically, and runs to the unpreferred alternative as punctate with occasional perseverative response runs. After reinforcement, choices are modeled as returning briefly to the lever location that had been just reinforced. This win–stay propensity is hypothesized as due to reinforcer induction.