TY - GEN
T1 - Lifelong transfer learning with an option hierarchy
AU - Hawasly, Majd
AU - Ramamoorthy, Subramanian
PY - 2013
Y1 - 2013
N2 - Many applications require autonomous agents to achieve quick responses to task instances drawn from a rich family of qualitatively-related tasks. We address the setting where the tasks share a state-action space and have the same qualitative objective but differ in dynamics. We adopt a transfer learning approach where common structure in previously-learnt policies, in the form of shared subtasks, is exploited to accelerate learning in subsequent ones. We use a probabilistic mixture model to describe regions in state space which are common to successful trajectories in different instances. Then, we extract policy fragments from previously-learnt policies that are specialised to these regions. These policy fragments are options, whose initiation and termination sets are automatically extracted from data by the mixture model. In novel task instances, these options are used in an SMDP learning process and option learning repeats over the resulting policy library. The utility of this method is demonstrated through experiments in a standard navigation environment and then in the RoboCup simulated soccer domain with opponent teams of different skill.
AB - Many applications require autonomous agents to achieve quick responses to task instances drawn from a rich family of qualitatively-related tasks. We address the setting where the tasks share a state-action space and have the same qualitative objective but differ in dynamics. We adopt a transfer learning approach where common structure in previously-learnt policies, in the form of shared subtasks, is exploited to accelerate learning in subsequent ones. We use a probabilistic mixture model to describe regions in state space which are common to successful trajectories in different instances. Then, we extract policy fragments from previously-learnt policies that are specialised to these regions. These policy fragments are options, whose initiation and termination sets are automatically extracted from data by the mixture model. In novel task instances, these options are used in an SMDP learning process and option learning repeats over the resulting policy library. The utility of this method is demonstrated through experiments in a standard navigation environment and then in the RoboCup simulated soccer domain with opponent teams of different skill.
UR - http://www.scopus.com/inward/record.url?scp=84893793770&partnerID=8YFLogxK
U2 - 10.1109/IROS.2013.6696523
DO - 10.1109/IROS.2013.6696523
M3 - Conference contribution
AN - SCOPUS:84893793770
SN - 9781467363587
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 1341
EP - 1346
BT - IROS 2013
T2 - 2013 26th IEEE/RSJ International Conference on Intelligent Robots and Systems: New Horizon, IROS 2013
Y2 - 3 November 2013 through 8 November 2013
ER -