MetaTOC stay on top of your field, easily

Transferring knowledge from human-demonstration trajectories to reinforcement learning

, , ,

Transactions of the Institute of Measurement and Control

Published online on

Abstract

Nowadays, transfer learning (TL) has become a crucial technique to accelerate the slow optimization procedure of reinforcement learning (RL) by re-utilizing knowledge acquired in a previous related task. Nevertheless, most of the current relevant research acquires knowledge through RL training in the source task, which would be too time-consuming. In view of this situation, in this paper, we propose a novel TL framework where the agent extracts knowledge from human-demonstration trajectories of the source task and reuses the knowledge in RL in the target task. As for what to transfer, two forms of knowledge deduced from the demonstration trajectories, which are the k-nearest neighbour of the current state in source samples and visit frequency of homologous states, are adopted. For how to transfer, the two forms of knowledge are respectively used to recommend a preferred action when random exploration is needed and to shape an instantaneous reward for RL. Simulation experiments of balancing Cart-Poles with different difficulties suggest that both the two forms of knowledge accelerate the learning process of RL obviously. What is more, the effect is even more significant when they are used in combination. In this case, the experimental results manifest the positive role of our framework in RL.