In several standard models of dynamic programming (gambling houses, Markov decision processes (MDPs), Partially observable MDPs (POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely, the strong uniform value. This solves two open problems. First, this shows that for any ε > 0, the decision maker has a pure strategy a which is ε-optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use ran domization). Second, for any ε > 0, the decision-maker can guarantee the limit of the n-stage value minus ε in the infinite problem, where the payoff is the expectation of the inferior limit of the time average payoff.

Venel, Xavier Mathieu Raymond; Ziliotto, B. (2016). Strong uniform value in gambling houses and partially observable Markov decision processes. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, (ISSN: 0363-0129), 54:4, 1983-2008. Doi: 10.1137/15M1043340.

Strong uniform value in gambling houses and partially observable Markov decision processes

Venel X;
2016

Abstract

In several standard models of dynamic programming (gambling houses, Markov decision processes (MDPs), Partially observable MDPs (POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely, the strong uniform value. This solves two open problems. First, this shows that for any ε > 0, the decision maker has a pure strategy a which is ε-optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use ran domization). Second, for any ε > 0, the decision-maker can guarantee the limit of the n-stage value minus ε in the infinite problem, where the payoff is the expectation of the inferior limit of the time average payoff.
2016
Dynamic programming, Markov decision processes, Partial Observation, Uniform value, Long-run average payoff
Venel, Xavier Mathieu Raymond; Ziliotto, B. (2016). Strong uniform value in gambling houses and partially observable Markov decision processes. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, (ISSN: 0363-0129), 54:4, 1983-2008. Doi: 10.1137/15M1043340.
File in questo prodotto:
File Dimensione Formato  
Ziliotto_2016.pdf

Solo gestori archivio

Tipologia: Versione dell'editore
Licenza: DRM (Digital rights management) non definiti
Dimensione 459 kB
Formato Adobe PDF
459 kB Adobe PDF   Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/197479
Citazioni
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 7
  • OpenAlex ND
social impact