Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Explaining Black Box Reinforcement Learning Agents Through Counterfactual Policies
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.ORCID iD: 0000-0002-1912-712x
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.ORCID iD: 0000-0002-4632-4815
Number of Authors: 42023 (English)In: Advances in Intelligent Data Analysis XXI: 21st International Symposium on Intelligent Data Analysis, IDA 2023, Louvain-la-Neuve, Belgium, April 12–14, 2023, Proceedings / [ed] Bruno Crémilleux; Sibylle Hess; Siegfried Nijssen, Springer , 2023, p. 314-326Conference paper, Published paper (Refereed)
Abstract [en]

Despite the increased attention to explainable AI, explainability methods for understanding reinforcement learning (RL) agents have not been extensively studied. Failing to understand the agent’s behavior may cause reduced productivity in human-agent collaborations, or mistrust in automated RL systems. RL agents are trained to optimize a long term cumulative reward, and in this work we formulate a novel problem on how to generate explanations on when an agent could have taken another action to optimize an alternative reward. More concretely, we aim at answering the question: What does an RL agent need to do differently to achieve an alternative target outcome? We introduce the concept of a counterfactual policy, as a policy trained to explain in which states a black box agent could have taken an alternative action to achieve another desired outcome. The usefulness of counterfactual policies is demonstrated in two experiments with different use-cases, and the results suggest that our solution can provide interpretable explanations.

Place, publisher, year, edition, pages
Springer , 2023. p. 314-326
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349
Keywords [en]
Explainable AI (XAI), Reinforcement Learning, Counterfactual Explanations
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-225173DOI: 10.1007/978-3-031-30047-9_25Scopus ID: 2-s2.0-85152589358ISBN: 978-3-031-30046-2 (print)ISBN: 978-3-031-30047-9 (electronic)OAI: oai:DiVA.org:su-225173DiVA, id: diva2:1825491
Conference
Advances in Intelligent Data Analysis XXI, 21st International Symposium on Intelligent Data Analysis, IDA 2023, Louvain-la-Neuve, Belgium, April 12–14, 2023.
Available from: 2024-01-09 Created: 2024-01-09 Last updated: 2024-01-10Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Movin, MariaChaliane Junior, Guilherme DinisHollmén, JaakkoPapapetrou, Panagiotis

Search in DiVA

By author/editor
Movin, MariaChaliane Junior, Guilherme DinisHollmén, JaakkoPapapetrou, Panagiotis
By organisation
Department of Computer and Systems Sciences
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 63 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf