部分可观测马尔可夫决策过程中多维收益函数研究

cs.AI updates on arXiv.org 09月30日

部分可观测马尔可夫决策过程中多维收益函数研究

本文研究了部分可观测马尔可夫决策过程中的多维收益函数结构，探讨了实现特定期望收益向量所需的策略类型。研究表明，纯策略不足以解决问题，并证明了通过有限混合策略可以近似任何期望收益向量。

arXiv:2502.18296v2 Announce Type: replace-cross Abstract: We consider multi-dimensional payoff functions in partially observable Markov decision processes. We study the structure of the set of expected payoff vectors of all strategies (policies) and study what kind are needed to achieve a given expected payoff vector. In general, pure strategies (i.e., not resorting to randomisation) do not suffice for this problem. We prove that for any payoff for which the expectation is well-defined under all strategies, it is sufficient to mix (i.e., randomly select a pure strategy at the start of a play and committing to it for the rest of the play) finitely many pure strategies to approximate any expected payoff vector up to any precision. Furthermore, for any payoff for which the expected payoff is finite under all strategies, any expected payoff can be obtained exactly by mixing finitely many strategies.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

马尔可夫决策过程收益函数策略混合

相关文章

This AI Paper from KAUST and Purdue University Presents Efficient Stochastic Methods for Large Discrete Action Spaces

基于CAPE的纯价值策略

Meta Reinforcement Learning

A (Long) Peek into Reinforcement Learning

Nat. Mach. Intell. 速递：大规模网络控制的高效和可扩展的强化学习

Control Synthesis in Partially Observable Environments for Complex Perception-Related Objectives

Analysis of AI Techniques for Orchestrating Edge-Cloud Application Migration

A Translation of Probabilistic Event Calculus into Markov Decision Processes

Action-List Reinforcement Learning Syndrome Decoding for Binary Linear Block Codes

Reinforcement Learning for Multi-Objective Multi-Echelon Supply Chain Optimisation