Dynamic preference inference network: Improving sample efficiency for multi-objective reinforcement learning by preference estimation

文献类型: 外文期刊

第一作者: Liu, Yang

作者: Liu, Yang;Zhou, Ying;He, Ziming;Yang, Yusen;Li, Jingchen;Han, Qingcen

作者机构:

关键词: Multi-objective reinforcement learning; Sample efficiency; Reinforcement learning

期刊名称:KNOWLEDGE-BASED SYSTEMS ( 影响因子:7.6; 五年影响因子:7.6 )

ISSN: 0950-7051

年卷期: 2024 年 304 卷

页码:

收录情况: SCI

摘要: Multi-objective reinforcement learning (MORL) addresses the challenge of optimizing policies in environments with multiple conflicting objectives. Traditional approaches often rely on scalar utility functions, which require predefined preference weights, limiting their adaptability and efficiency. To overcome this, we propose the Dynamic Preference Inference Network (DPIN), a novel method designed to enhance sample efficiency by dynamically estimating the trajectory decision preference of the agent. DPIN leverages a neural network to predict the most favorable preference distribution for each trajectory, enabling more effective policy updates and improving overall performance in complex MORL tasks. Extensive experiments in various benchmark environments demonstrate that DPIN significantly outperforms existing state-of-the-art methods, achieving higher scalarized returns and hypervolume. Our findings highlight DPIN's ability to adapt to varying preferences, reduce sample complexity, and provide robust solutions in multi-objective settings.

分类号:

  • 相关文献
作者其他论文 更多>>