In several standard models of dynamic programming (gambling houses, Markov decision processes (MDPs), Partially observable MDPs (POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely, the strong uniform value. This solves two open problems. First, this shows that for any ε > 0, the decision maker has a pure strategy a which is ε-optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use ran domization). Second, for any ε > 0, the decision-maker can guarantee the limit of the n-stage value minus ε in the infinite problem, where the payoff is the expectation of the inferior limit of the time average payoff.

Strong uniform value in gambling houses and partially observable Markov decision processes / Venel, X; Ziliotto, B. - In: SIAM JOURNAL ON CONTROL AND OPTIMIZATION. - ISSN 0363-0129. - 54:4(2016), pp. 1983-2008. [10.1137/15M1043340]

Strong uniform value in gambling houses and partially observable Markov decision processes

Venel X;
2016

Abstract

In several standard models of dynamic programming (gambling houses, Markov decision processes (MDPs), Partially observable MDPs (POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely, the strong uniform value. This solves two open problems. First, this shows that for any ε > 0, the decision maker has a pure strategy a which is ε-optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use ran domization). Second, for any ε > 0, the decision-maker can guarantee the limit of the n-stage value minus ε in the infinite problem, where the payoff is the expectation of the inferior limit of the time average payoff.
Dynamic programming, Markov decision processes, Partial Observation, Uniform value, Long-run average payoff
Strong uniform value in gambling houses and partially observable Markov decision processes / Venel, X; Ziliotto, B. - In: SIAM JOURNAL ON CONTROL AND OPTIMIZATION. - ISSN 0363-0129. - 54:4(2016), pp. 1983-2008. [10.1137/15M1043340]
File in questo prodotto:
File Dimensione Formato  
Ziliotto_2016.pdf

Solo gestori archivio

Tipologia: Versione dell'editore
Licenza: DRM non definito
Dimensione 459 kB
Formato Adobe PDF
459 kB Adobe PDF   Visualizza/Apri
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/197479
Citazioni
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 5
social impact