We consider POMDPs in which the weight of the stage payoff depends on the past sequence of signals and actions occurring in the infinitely repeated problem. We prove that for all ε > 0, there exists a strategy that is ε-optimal for any sequence of weights that is regular enough. This unifies and generalizes several results of the literature, and applies notably to POMDPs with limsup payoffs.

History-dependent Evaluations in Partially Observable Markov Decision Process / Venel, Xavier Mathieu Raymond; Ziliotto, Bruno. - In: SIAM JOURNAL ON CONTROL AND OPTIMIZATION. - ISSN 0363-0129. - 59:2(2021), pp. 1730-1755. [10.1137/20M1332876]

History-dependent Evaluations in Partially Observable Markov Decision Process

Xavier Mathieu Raymond Venel
;
2021

Abstract

We consider POMDPs in which the weight of the stage payoff depends on the past sequence of signals and actions occurring in the infinitely repeated problem. We prove that for all ε > 0, there exists a strategy that is ε-optimal for any sequence of weights that is regular enough. This unifies and generalizes several results of the literature, and applies notably to POMDPs with limsup payoffs.
2021
Markov decision processes, Partial Observation, Long-run average payoff
History-dependent Evaluations in Partially Observable Markov Decision Process / Venel, Xavier Mathieu Raymond; Ziliotto, Bruno. - In: SIAM JOURNAL ON CONTROL AND OPTIMIZATION. - ISSN 0363-0129. - 59:2(2021), pp. 1730-1755. [10.1137/20M1332876]
File in questo prodotto:
File Dimensione Formato  
Final_version.pdf

Solo gestori archivio

Tipologia: Documento in Pre-print
Licenza: DRM (Digital rights management) non definiti
Dimensione 367.39 kB
Formato Adobe PDF
367.39 kB Adobe PDF   Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/207667
Citazioni
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact