On the Convergence of TD-Learning on Markov Reward Processes with Hidden States
Number of Authors: 22024 (English)In: European Control Conference (ECC), IEEE (Institute of Electrical and Electronics Engineers) , 2024, p. 2097-2104Conference paper, Published paper (Refereed)
Abstract [en]
We investigate the convergence properties of Temporal Difference (TD) Learning on Markov Reward Processes (MRPs) with new structures for incorporating hidden state information. In particular, each state is characterized by both observable and hidden components, with the assumption that the observable and hidden parts are statistically independent. This setup differs from Hidden Markov Models and Partially Observable Markov Decision Models, in that here it is not possible to infer the hidden information from the state observations. Nevertheless, the hidden state influences the MRP through the rewards, rendering the reward sequence non-Markovian. We prove that TD learning, when applied only on the observable part of the states, converges to a fixed point under mild assumptions on the step-size. Furthermore, we characterize this fixed point in terms of the statistical properties of both the Markov chains representing the observable and hidden parts of the states. Beyond the theoretical results, we illustrate the novel structure on two application setups in communications. Furthermore, we validate our results through experimental evidence, showcasing the convergence of the algorithm in practice.
Place, publisher, year, edition, pages
IEEE (Institute of Electrical and Electronics Engineers) , 2024. p. 2097-2104
Keywords [en]
Machine learning algorithms, Monte Carlo methods, Heuristic algorithms, Temporal difference learning, Hidden Markov models, Rendering (computer graphics), Vectors
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-232983DOI: 10.23919/ECC64448.2024.10591108Scopus ID: 2-s2.0-85200556365ISBN: 978-3-9071-4410-7 (electronic)OAI: oai:DiVA.org:su-232983DiVA, id: diva2:1893543
Conference
European Control Conference (ECC), 25-28 June, 2024, Stockholm, Sweden.
2024-08-292024-08-292024-09-04Bibliographically approved