Lifelong learning of structure in the space of policies

Majd Hawasly, Subramanian Ramamoorthy

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Citations (Scopus)

Abstract

We address the problem faced by an autonomous agent that must achieve quick responses to a family of qualitatively-related tasks, such as a robot interacting with different types of human participants. We work in the setting where the tasks share a state-action space and have the same qualitative objective but differ in the dynamics and reward process. We adopt a transfer approach where the agent attempts to exploit common structure in learnt policies to accelerate learning in a new one. Our technique consists of a few key steps. First, we use a probabilistic model to describe the regions in state space which successful trajectories seem to prefer. Then, we extract policy fragments from previously-learnt policies for these regions as candidates for reuse. These fragments may be treated as options with corresponding domains and termination conditions extracted by unsupervised learning. Then, the set of reusable policies is used when learning novel tasks, and the process repeats. The utility of this method is demonstrated through experiments in the simulated soccer domain, where the variability comes from the different possible behaviours of opponent teams, and the agent needs to perform well against novel opponents.

Original languageEnglish
Title of host publicationLifelong Machine Learning - Papers from the AAAI Spring Symposium, Technical Report
Pages21-26
Number of pages6
Publication statusPublished - 2013
Externally publishedYes
Event2013 AAAI Spring Symposium - Palo Alto, CA, United States
Duration: 25 Mar 201327 Mar 2013

Publication series

NameAAAI Spring Symposium - Technical Report
VolumeSS-13-05

Conference

Conference2013 AAAI Spring Symposium
Country/TerritoryUnited States
CityPalo Alto, CA
Period25/03/1327/03/13

Fingerprint

Dive into the research topics of 'Lifelong learning of structure in the space of policies'. Together they form a unique fingerprint.

Cite this