We consider POMDPs in which the weight of the stage payoff depends on the past sequence of signals and actions occurring in the infinitely repeated problem. We prove that for all ε > 0, there exists a strategy that is ε-optimal for any sequence of weights that is regular enough. This unifies and generalizes several results of the literature, and applies notably to POMDPs with limsup payoffs.
History-dependent Evaluations in Partially Observable Markov Decision Process / Venel, Xavier Mathieu Raymond; Ziliotto, Bruno. - In: SIAM JOURNAL ON CONTROL AND OPTIMIZATION. - ISSN 0363-0129. - 59:2(2021), pp. 1730-1755. [10.1137/20M1332876]
History-dependent Evaluations in Partially Observable Markov Decision Process
Xavier Mathieu Raymond Venel
;
2021
Abstract
We consider POMDPs in which the weight of the stage payoff depends on the past sequence of signals and actions occurring in the infinitely repeated problem. We prove that for all ε > 0, there exists a strategy that is ε-optimal for any sequence of weights that is regular enough. This unifies and generalizes several results of the literature, and applies notably to POMDPs with limsup payoffs.File | Dimensione | Formato | |
---|---|---|---|
Final_version.pdf
Solo gestori archivio
Tipologia:
Documento in Pre-print
Licenza:
DRM (Digital rights management) non definiti
Dimensione
367.39 kB
Formato
Adobe PDF
|
367.39 kB | Adobe PDF | Visualizza/Apri |
HIstory_dependent_evaluation_SIAM.pdf
Open Access
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
431.52 kB
Formato
Adobe PDF
|
431.52 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.