Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
On the Convergence of TD-Learning on Markov Reward Processes with Hidden States
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.ORCID iD: 0000-0002-6617-8683
Number of Authors: 22024 (English)In: European Control Conference (ECC), IEEE (Institute of Electrical and Electronics Engineers) , 2024, p. 2097-2104Conference paper, Published paper (Refereed)
Abstract [en]

We investigate the convergence properties of Temporal Difference (TD) Learning on Markov Reward Processes (MRPs) with new structures for incorporating hidden state information. In particular, each state is characterized by both observable and hidden components, with the assumption that the observable and hidden parts are statistically independent. This setup differs from Hidden Markov Models and Partially Observable Markov Decision Models, in that here it is not possible to infer the hidden information from the state observations. Nevertheless, the hidden state influences the MRP through the rewards, rendering the reward sequence non-Markovian. We prove that TD learning, when applied only on the observable part of the states, converges to a fixed point under mild assumptions on the step-size. Furthermore, we characterize this fixed point in terms of the statistical properties of both the Markov chains representing the observable and hidden parts of the states. Beyond the theoretical results, we illustrate the novel structure on two application setups in communications. Furthermore, we validate our results through experimental evidence, showcasing the convergence of the algorithm in practice.

Place, publisher, year, edition, pages
IEEE (Institute of Electrical and Electronics Engineers) , 2024. p. 2097-2104
Keywords [en]
Machine learning algorithms, Monte Carlo methods, Heuristic algorithms, Temporal difference learning, Hidden Markov models, Rendering (computer graphics), Vectors
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-232983DOI: 10.23919/ECC64448.2024.10591108Scopus ID: 2-s2.0-85200556365ISBN: 978-3-9071-4410-7 (electronic)OAI: oai:DiVA.org:su-232983DiVA, id: diva2:1893543
Conference
European Control Conference (ECC), 25-28 June, 2024, Stockholm, Sweden.
Available from: 2024-08-29 Created: 2024-08-29 Last updated: 2024-09-04Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Amiri, MohsenMagnússon, Sindri

Search in DiVA

By author/editor
Amiri, MohsenMagnússon, Sindri
By organisation
Department of Computer and Systems Sciences
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 93 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf