Adaptive Q-learning for data-based optimal output regulation with experience replay

Biao Luo*, Yin Yang, Derong Liu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

124 Citations (Scopus)

Abstract

In this paper, the data-based optimal output regulation problem of discrete-time systems is investigated. An off-policy adaptive Q-learning (QL) method is developed by using real system data without requiring the knowledge of system dynamics and the mathematical model of utility function. By introducing the Q-function, an off-policy adaptive QL algorithm is developed to learn the optimal Q-function. An adaptive parameter αi in the policy evaluation is used to achieve tradeoff between the current and future Q-functions. The convergence of adaptive QL algorithm is proved and the influence of the adaptive parameter is analyzed. To realize the adaptive QL algorithm with real system data, the actor-critic neural network (NN) structure is developed. The least-squares scheme and the batch gradient descent method are developed to update the critic and actor NN weights, respectively. The experience replay technique is employed in the learning process, which leads to simple and convenient implementation of the adaptive QL method. Finally, the effectiveness of the developed adaptive QL method is verified through numerical simulations.

Original languageEnglish
Article number8351999
Pages (from-to)3337-3348
Number of pages12
JournalIEEE Transactions on Cybernetics
Volume48
Issue number12
DOIs
Publication statusPublished - Dec 2018

Keywords

  • Data-based
  • Q-learning (QL)
  • experience replay
  • neural networks (NNs)
  • off-policy
  • optimal control

Fingerprint

Dive into the research topics of 'Adaptive Q-learning for data-based optimal output regulation with experience replay'. Together they form a unique fingerprint.

Cite this