QMDP优化：累积收益分位数策略

cs.AI updates on arXiv.org 10月16日

QMDP优化：累积收益分位数策略

本文研究量化累积收益的QMDP优化问题，提供最优值函数的解析结果，并设计动态规划算法求解最优策略，应用于HIV治疗启动问题。

arXiv:1711.05788v5 Announce Type: replace Abstract: The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paper we consider the problem of optimizing the quantiles of the cumulative rewards of a Markov decision process (MDP), which we refer to as a quantile Markov decision process (QMDP). We provide analytical results characterizing the optimal QMDP value function and present a dynamic programming-based algorithm to solve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk (CVaR) objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatment initiation problem, where patients aim to balance the potential benefits and risks of the treatment.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

QMDP 累积收益动态规划 HIV治疗优化

相关文章

Optimization, Machine Learning and Intelligent Experimentation with Michael McCourt - #545

Automated Model Tuning with SigOpt - #324

Supporting Rapid Model Development at Two Sigma with Matt Adereth & Scott Clark - TWIML Talk #273

理想汽车开启新一轮人员调整，优化超过18%

破冰！浙江正式全面开启长三角船检通检互认试点

中金公司：算力硬件市场有望步入以价换量时代

Model Explorer: A Powerful Graph Visualization Tool that Helps One Understand, Debug, and Optimize Machine Learning Models

保立佳：全资子公司烟台保立佳停产搬迁

调研早知道|跨境电商再迎政策利好！板块普涨下哪只个股直接受益？

八部门：推行开办餐饮店“一体办”