Tytuł pozycji:
Self-improving Q-learning based controller for a class of dynamical processes
This paper presents how Q-learning algorithm can be applied as a general-purpose self-improving controller for use in industrial automation as a substitute for conventional PI controller implemented without proper tuning. Traditional Q-learning approach is redefined to better fit the applications in practical control loops, including new definition of the goal state by the closed loop reference trajectory and discretization of state space and accessible actions (manipulating variables). Properties of Q-learning algorithm are investigated in terms of practical applicability with a special emphasis on initializing of Q-matrix based only on preliminary PI tunings to ensure bumpless switching between existing controller and replacing Q-learning algorithm. A general approach for design of Q-matrix and learning policy is suggested and the concept is systematically validated by simulation in the application to control two examples of processes exhibiting first order dynamics and oscillatory second order dynamics. Results show that online learning using interaction with controlled process is possible and it ensures significant improvement in control performance compared to arbitrarily tuned PI controller.
1. This work was financed by the grant from SUT - subsidy for maintaining and developing the research potential in 2020: K. Stebel, J. Czeczot (grant 02/060/BK_20/0007, BK-276/RAU3/2020) and J. Musial (grant BKM-724/RAU3/2020).
2. Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).