Abstract
In this paper, the data-based optimal output regulation problem of discrete-time systems is investigated. An off-policy adaptive Q-learning (QL) method is developed by using real system data without requiring the knowledge of system dynamics and the mathematical model of utility function. By introducing the Q-function, an off-policy adaptive QL algorithm is developed to learn the optimal Q-function. An adaptive parameter αi in the policy evaluation is used to achieve tradeoff between the current and future Q-functions. The convergence of adaptive QL algorithm is proved and the influence of the adaptive parameter is analyzed. To realize the adaptive QL algorithm with real system data, the actor-critic neural network (NN) structure is developed. The least-squares scheme and the batch gradient descent method are developed to update the critic and actor NN weights, respectively. The experience replay technique is employed in the learning process, which leads to simple and convenient implementation of the adaptive QL method. Finally, the effectiveness of the developed adaptive QL method is verified through numerical simulations.
Original language | English |
---|---|
Article number | 8351999 |
Pages (from-to) | 3337-3348 |
Number of pages | 12 |
Journal | IEEE Transactions on Cybernetics |
Volume | 48 |
Issue number | 12 |
DOIs | |
Publication status | Published - Dec 2018 |
Keywords
- Data-based
- Q-learning (QL)
- experience replay
- neural networks (NNs)
- off-policy
- optimal control