文献类型: 外文期刊
作者: Li, Jingchen 1 ; Shi, Haobin 2 ; Wu, Huarui 1 ; Zhao, Chunjiang 1 ; Hwang, Kao-Shing 3 ;
作者机构: 1.Beijing Acad Agr & Forestry Sci, Informat Technol Res Ctr, Beijing 100079, Peoples R China
2.Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Shaanxi, Peoples R China
3.Natl Sun Yat sen Univ, Dept Elect Engn, Kaohsiung 80424, Taiwan
关键词: Online reinforcement learning; overfitting; reinforcement learning
期刊名称:IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS ( 影响因子:10.4; 五年影响因子:11.2 )
ISSN: 2162-237X
年卷期: 2024 年
页码:
收录情况: SCI
摘要: Excessive invalid explorations at the beginning of training lead deep reinforcement learning process to fall into the risk of overfitting, further resulting in spurious decisions, which obstruct agents in the following states and explorations. This phenomenon is termed primacy bias in online reinforcement learning. This work systematically investigates the primacy bias in online reinforcement learning, discussing the reason for primacy bias, while the characteristic of primacy bias is also analyzed. Besides, to learn a policy generalized to the following states and explorations, we develop an online reinforcement learning framework, termed self-distillation reinforcement learning (SDRL), based on knowledge distillation, allowing the agent to transfer the learned knowledge into a randomly initialized policy at regular intervals, and the new policy network is used to replace the original one in the following training. The core idea for this work is distilling knowledge from the trained policy to another policy can filter biases out, generating a more generalized policy in the learning process. Moreover, to avoid the overfitting of the new policy due to excessive distillations, we add an additional loss in the knowledge distillation process, using L2 regularization to improve the generalization, and the self-imitation mechanism is introduced to accelerate the learning on the current experiences. The results of several experiments in DMC and Atari 100k suggest the proposal has the ability to eliminate primacy bias for reinforcement learning methods, and the policy after knowledge distillation can urge agents to get higher scores more quickly.
- 相关文献
作者其他论文 更多>>
-
Recognition of maize seedling under weed disturbance using improved YOLOv5 algorithm
作者:Tang, Boyi;Zhao, Chunjiang;Tang, Boyi;Zhou, Jingping;Pan, Yuchun;Qu, Xuzhou;Cui, Yanglin;Liu, Chang;Li, Xuguang;Zhao, Chunjiang;Gu, Xiaohe;Li, Xuguang
关键词:Object detection; Maize seedlings; UAV RGB images; YOLOv5; Attention mechanism
-
Boosting Cost-Efficiency in Robotics: A Distributed Computing Approach for Harvesting Robots
作者:Xie, Feng;Xie, Feng;Li, Tao;Feng, Qingchun;Li, Tao;Feng, Qingchun;Chen, Liping;Zhao, Chunjiang;Zhao, Hui
关键词:5G network; computation allocation; edge computing; harvesting robot; visual system
-
Genotyping Identification of Maize Based on Three-Dimensional Structural Phenotyping and Gaussian Fuzzy Clustering
作者:Xu, Bo;Zhao, Chunjiang;Xu, Bo;Zhao, Chunjiang;Yang, Guijun;Zhang, Yuan;Liu, Changbin;Feng, Haikuan;Yang, Xiaodong;Yang, Hao;Xu, Bo;Zhao, Chunjiang;Yang, Guijun;Zhang, Yuan;Liu, Changbin;Feng, Haikuan;Yang, Xiaodong;Yang, Hao
关键词:tassel; 3D phenotyping; TreeQSM; genotyping; clustering
-
High-throughput phenotyping techniques for forage: Status, bottleneck, and challenges
作者:Cheng, Tao;Zhang, Dongyan;Cheng, Tao;Wang, Zhaoming;Zhang, Dongyan;Zhang, Gan;Yuan, Feng;Liu, Yaling;Wang, Tianyi;Ren, Weibo;Zhao, Chunjiang
关键词:Forage; High-throughput phenotyping; Precision identification; Sensors; Artificial intelligence; Efficient breeding
-
Enhancing potato leaf protein content, carbon-based constituents, and leaf area index monitoring using radiative transfer model and deep learning
作者:Feng, Haikuan;Fan, Yiguang;Ma, Yanpeng;Liu, Yang;Chen, Riqiang;Bian, Mingbo;Fan, Jiejie;Yang, Guijun;Zhao, Chunjiang;Feng, Haikuan;Zhao, Chunjiang;Yue, Jibo;Fu, Yuanyuan;Leng, Mengdie;Jin, Xiuliang;Zhao, Yu
关键词:Potato; Deep learning; Radiative transfer model; Transfer learning; Leaf protein content
-
Revolutionizing Crop Breeding: Next-Generation Artificial Intelligence and Big Data-Driven Intelligent Design
作者:Zhang, Ying;Guo, Xinyu;Zhao, Chunjiang;Huang, Guanmin;Lu, Xianju;Wang, Yanru;Wang, Chuanyu;Zhang, Ying;Guo, Xinyu;Zhao, Chunjiang;Huang, Guanmin;Lu, Xianju;Wang, Yanru;Wang, Chuanyu;Zhang, Ying;Guo, Xinyu;Zhao, Chunjiang;Huang, Guanmin;Lu, Xianju;Wang, Yanru;Wang, Chuanyu;Zhao, Yanxin
关键词:Crop breeding; Next-generation artificial intelligence; Multiomics big data; Intelligent design breeding
-
Water phase distribution and its dependence on internal structure in soaking maize kernels: a study using low-field nuclear magnetic resonance and X-ray micro-computed tomography
作者:Wang, Baiyan;Zhao, Chunjiang;Wang, Baiyan;Gu, Shenghao;Wang, Juan;Wang, Guangtao;Guo, Xinyu;Zhao, Chunjiang
关键词:phenotyping; hydration; water absorption; seed emergence; kernel moisture



