Policy Iteration Q-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems

Biao Luo*, Yin Yang, Derong Liu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

97 Citations (Scopus)

Abstract

In this article, the data-based two-player zero-sum game problem is considered for linear discrete-time systems. This problem theoretically depends on solving the discrete-time game algebraic Riccati equation (DTGARE), while it requires complete system dynamics. To avoid solving the DTGARE, the Q-function is introduced and a data-based policy iteration Q-learning (PIQL) algorithm is developed to learn the optimal Q-function by using data collected from the real system. Writing the Q-function in a quadratic form, it is proved that the PIQL algorithm is equivalent to the Newton iteration method in the Banach space by using the Fréchet derivative. Then, the convergence of the PIQL algorithm can be guaranteed by Kantorovich's theorem. For the realization of the PIQL algorithm, the off-policy learning scheme is proposed using real data rather than the system model. Finally, the efficiency of the developed data-based PIQL method is validated through simulation studies.

Original languageEnglish
Article number9005399
Pages (from-to)3630-3640
Number of pages11
JournalIEEE Transactions on Cybernetics
Volume51
Issue number7
DOIs
Publication statusPublished - Jul 2021

Keywords

  • Adaptive dynamic programming (ADP)
  • Q-learning
  • discrete-time systems
  • policy iteration
  • two-player zero-sum game

Fingerprint

Dive into the research topics of 'Policy Iteration Q-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems'. Together they form a unique fingerprint.

Cite this