IRIS - Institutional Research Information System

In several standard models of dynamic programming (gambling houses, Markov decision processes (MDPs), Partially observable MDPs (POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely, the strong uniform value. This solves two open problems. First, this shows that for any ε > 0, the decision maker has a pure strategy a which is ε-optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use ran domization). Second, for any ε > 0, the decision-maker can guarantee the limit of the n-stage value minus ε in the infinite problem, where the payoff is the expectation of the inferior limit of the time average payoff.

Venel, Xavier Mathieu Raymond; Ziliotto, B. (2016). Strong uniform value in gambling houses and partially observable Markov decision processes. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, (ISSN: 0363-0129), 54:4, 1983-2008. Doi: 10.1137/15M1043340.

Strong uniform value in gambling houses and partially observable Markov decision processes

Venel X;Ziliotto B

2016

Abstract

In several standard models of dynamic programming (gambling houses, Markov decision processes (MDPs), Partially observable MDPs (POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely, the strong uniform value. This solves two open problems. First, this shows that for any ε > 0, the decision maker has a pure strategy a which is ε-optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use ran domization). Second, for any ε > 0, the decision-maker can guarantee the limit of the n-stage value minus ε in the infinite problem, where the payoff is the expectation of the inferior limit of the time average payoff.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2016
			
	Parole chiave
	
				Dynamic programming, Markov decision processes, Partial Observation, Uniform value, Long-run average payoff
			
	Citazione
	
				Venel, Xavier Mathieu Raymond; Ziliotto, B. (2016). Strong uniform value in gambling houses and partially observable Markov decision processes. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, (ISSN: 0363-0129), 54:4, 1983-2008. Doi: 10.1137/15M1043340.
			
	Appare nelle tipologie:
	
				01.1 - Articolo su rivista (Article)

File in questo prodotto:

File	Dimensione	Formato
Ziliotto_2016.pdf Solo gestori archivio Tipologia: Versione dell'editore Licenza: DRM (Digital rights management) non definiti Dimensione 459 kB Formato Adobe PDF Visualizza/Apri	459 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/197479

Citazioni

8

8

ND

social impact