In several standard models of dynamic programming (gambling houses, Markov decision processes (MDPs), Partially observable MDPs (POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely, the strong uniform value. This solves two open problems. First, this shows that for any ε > 0, the decision maker has a pure strategy a which is ε-optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use ran domization). Second, for any ε > 0, the decision-maker can guarantee the limit of the n-stage value minus ε in the infinite problem, where the payoff is the expectation of the inferior limit of the time average payoff.
|Titolo:||Strong uniform value in gambling houses and partially observable Markov decision processes|
|Data di pubblicazione:||2016|
|Appare nelle tipologie:||01.1 - Articolo su rivista (Article)|
File in questo prodotto:
|Ziliotto_2016.pdf||Versione dell'editore||DRM non definito||Administrator|