Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Policy Evaluation with Delayed, Aggregated Anonymous Feedback
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.ORCID iD: 0000-0002-6617-8683
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.ORCID iD: 0000-0002-1912-712x
2022 (English)In: Discovery Science: 25th International Conference, DS 2022, Montpellier, France, October 10–12, 2022, Proceedings / [ed] Poncelet Pascal; Dino Ienco, Springer Nature , 2022, p. 114-123Conference paper, Published paper (Refereed)
Abstract [en]

In reinforcement learning, an agent makes decisions to maximize rewards in an environment. Rewards are an integral part of the reinforcement learning as they guide the agent towards its learning objective. However, having consistent rewards can be infeasible in certain scenarios, due to either cost, the nature of the problem or other constraints. In this paper, we investigate the problem of delayed, aggregated, and anonymous rewards. We propose and analyze two strategies for conducting policy evaluation under cumulative periodic rewards, and study them by making use of simulation environments. Our findings indicate that both strategies can achieve similar sample efficiency as when we have consistent rewards.

Place, publisher, year, edition, pages
Springer Nature , 2022. p. 114-123
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 13601
Keywords [en]
Reinforcement learning, Markov Decision Process (MDP), Reward estimation
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-213202DOI: 10.1007/978-3-031-18840-4_9Scopus ID: 2-s2.0-85142725312ISBN: 978-3-031-18839-8 (print)ISBN: 978-3-031-18840-4 (electronic)OAI: oai:DiVA.org:su-213202DiVA, id: diva2:1721855
Conference
25th International Conference, DS 2022, Montpellier, France, October 10–12, 2022
Available from: 2022-12-22 Created: 2022-12-22 Last updated: 2023-01-04Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Chaliane Junior, Guilherme DinisMagnússon, SindriHollmén, Jaakko

Search in DiVA

By author/editor
Chaliane Junior, Guilherme DinisMagnússon, SindriHollmén, Jaakko
By organisation
Department of Computer and Systems Sciences
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 84 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf