您好,欢迎访问北京市农林科学院 机构知识库!

Dynamic preference inference network: Improving sample efficiency for multi-objective reinforcement learning by preference estimation

文献类型: 外文期刊

作者: Liu, Yang 1 ; Zhou, Ying 2 ; He, Ziming 2 ; Yang, Yusen 3 ; Han, Qingcen 4 ; Li, Jingchen 3 ;

作者机构: 1.Zhejiang Univ, Coll Opt Sci & Engn, Hangzhou 310058, Zhejiang, Peoples R China

2.Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Shaanxi, Peoples R China

3.Beijing Acad Agr & Forestry Sci, Informat Technol Res Ctr, Beijing 100079, Peoples R China

4.Northwestern Polytech Univ, Elect Informat Coll, Xian 710072, Shaanxi, Peoples R China

关键词: Multi-objective reinforcement learning; Sample efficiency; Reinforcement learning

期刊名称:KNOWLEDGE-BASED SYSTEMS ( 影响因子:7.6; 五年影响因子:7.6 )

ISSN: 0950-7051

年卷期: 2024 年 304 卷

页码:

收录情况: SCI

摘要: Multi-objective reinforcement learning (MORL) addresses the challenge of optimizing policies in environments with multiple conflicting objectives. Traditional approaches often rely on scalar utility functions, which require predefined preference weights, limiting their adaptability and efficiency. To overcome this, we propose the Dynamic Preference Inference Network (DPIN), a novel method designed to enhance sample efficiency by dynamically estimating the trajectory decision preference of the agent. DPIN leverages a neural network to predict the most favorable preference distribution for each trajectory, enabling more effective policy updates and improving overall performance in complex MORL tasks. Extensive experiments in various benchmark environments demonstrate that DPIN significantly outperforms existing state-of-the-art methods, achieving higher scalarized returns and hypervolume. Our findings highlight DPIN's ability to adapt to varying preferences, reduce sample complexity, and provide robust solutions in multi-objective settings.

  • 相关文献
作者其他论文 更多>>