北京市农林科学院机构知识库

Eliminating Primacy Bias in Online Reinforcement Learning by Self-Distillation

收藏
分享
全文链接

文献类型：外文期刊

作者： Li, Jingchen ¹ ; Shi, Haobin ² ; Wu, Huarui ¹ ; Zhao, Chunjiang ¹ ; Hwang, Kao-Shing ³ ;

作者机构： 1.Beijing Acad Agr & Forestry Sci, Informat Technol Res Ctr, Beijing 100079, Peoples R China

2.Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Shaanxi, Peoples R China

3.Natl Sun Yat sen Univ, Dept Elect Engn, Kaohsiung 80424, Taiwan

关键词： Online reinforcement learning; overfitting; reinforcement learning

期刊名称：IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS （影响因子：10.4；五年影响因子：11.2 ）

ISSN： 2162-237X

年卷期： 2024 年

页码：

收录情况： SCI

摘要： Excessive invalid explorations at the beginning of training lead deep reinforcement learning process to fall into the risk of overfitting, further resulting in spurious decisions, which obstruct agents in the following states and explorations. This phenomenon is termed primacy bias in online reinforcement learning. This work systematically investigates the primacy bias in online reinforcement learning, discussing the reason for primacy bias, while the characteristic of primacy bias is also analyzed. Besides, to learn a policy generalized to the following states and explorations, we develop an online reinforcement learning framework, termed self-distillation reinforcement learning (SDRL), based on knowledge distillation, allowing the agent to transfer the learned knowledge into a randomly initialized policy at regular intervals, and the new policy network is used to replace the original one in the following training. The core idea for this work is distilling knowledge from the trained policy to another policy can filter biases out, generating a more generalized policy in the learning process. Moreover, to avoid the overfitting of the new policy due to excessive distillations, we add an additional loss in the knowledge distillation process, using L2 regularization to improve the generalization, and the self-imitation mechanism is introduced to accelerate the learning on the current experiences. The results of several experiments in DMC and Atari 100k suggest the proposal has the ability to eliminate primacy bias for reinforcement learning methods, and the policy after knowledge distillation can urge agents to get higher scores more quickly.

相关文献

作者其他论文更多>>

Research on Positioning and Navigation System of Greenhouse Mobile Robot Based on Multi-Sensor Fusion

作者：Cheng, Bo;Li, Xiaoyue;Zhang, Ning;Song, Weitang;He, Xueying;Wu, Huarui

关键词：agricultural greenhouse; navigation robot; multi-sensor fusion; ultra-wideband; inertial measurement unit; odometry; rangefinder
Recognition of wheat rusts in a field environment based on improved DenseNet

作者：Chang, Shenglong;Cheng, Jinpeng;Fan, Zehua;Ma, Xinming;Li, Yong;Zhao, Chunjiang;Chang, Shenglong;Yang, Guijun;Cheng, Jinpeng;Fan, Zehua;Yang, Xiaodong;Zhao, Chunjiang

关键词：Plant disease; Wheat rust; Image processing; Deep learning; Computer vision (CV); DenseNet
GCVC: Graph Convolution Vector Distribution Calibration for Fish Group Activity Recognition

作者：Zhao, Zhenxi;Zhao, Chunjiang;Zhao, Zhenxi;Yang, Xinting;Zhou, Chao;Zhao, Chunjiang;Zhao, Zhenxi;Yang, Xinting;Zhou, Chao;Zhao, Chunjiang;Zhao, Zhenxi;Yang, Xinting;Zhou, Chao;Zhao, Chunjiang;Liu, Jintao

关键词：Fish; Feature extraction; Activity recognition; Calibration; Adhesives; Training; Convolution; Graph convolution vector calibration; fish group activity; activity feature vector calibration; fish activity dataset
Adaptive precision cutting method for rootstock grafting of melons: modeling, analysis, and validation

作者：Chen, Shan;Zhao, Chunjiang;Chen, Shan;Jiang, Kai;Zheng, Wengang;Jia, Dongdong;Zhao, Chunjiang;Jiang, Kai;Zheng, Wengang;Jia, Dongdong;Zhao, Chunjiang

关键词：Melon; Grafting robot; Adaptive cutting; Rootstock pith cavity; Machine vision
Long-range infrared absorption spectroscopy and fast mass spectrometry for rapid online measurements of volatile organic compounds from black tea fermentation

作者：Yang, Chongshan;Li, Guanglin;Zhao, Chunjiang;Fu, Xinglan;Yang, Chongshan;Jiao, Leizi;Wen, Xuelin;Lin, Peng;Duan, Dandan;Zhao, Chunjiang;Dong, Daming;Yang, Chongshan;Jiao, Leizi;Wen, Xuelin;Lin, Peng;Duan, Dandan;Dong, Daming;Dong, Chunwang

关键词：Black tea fermentation; Volatile organic compounds; Proton transfer reaction mass spectrometry; Fourier transform infrared spectroscopy; Principal component analysis; Extreme learning machine
Navigation line extraction algorithm for corn spraying robot based on YOLOv8s-CornNet

作者：Guo, Peiliang;Diao, Zhihua;Ma, Shushuai;He, Zhendong;Zhao, Suna;Zhao, Chunjiang;Li, Jiangbo;Zhang, Ruirui;Yang, Ranbing;Zhang, Baohua

关键词：agricultural robotics; computer vision; deep learning; navigation line extraction; network lightweight
An ultra-lightweight method for individual identification of cow-back pattern images in an open image set

作者：Wang, Rong;Gao, Ronghua;Li, Qifeng;Zhao, Chunjiang;Ding, Luyu;Yu, Ligen;Ma, Weihong;Wang, Rong;Zhao, Chunjiang;Gao, Ronghua;Li, Qifeng;Zhao, Chunjiang;Ding, Luyu;Yu, Ligen;Ma, Weihong;Ru, Lin

关键词：Cow-back pattern; Cow recognition; LightCowsNet; Open image set; Deep learning

Eliminating Primacy Bias in Online Reinforcement Learning by Self-Distillation

作者其他论文 更多>>

Research on Positioning and Navigation System of Greenhouse Mobile Robot Based on Multi-Sensor Fusion

Recognition of wheat rusts in a field environment based on improved DenseNet

GCVC: Graph Convolution Vector Distribution Calibration for Fish Group Activity Recognition

Adaptive precision cutting method for rootstock grafting of melons: modeling, analysis, and validation

Long-range infrared absorption spectroscopy and fast mass spectrometry for rapid online measurements of volatile organic compounds from black tea fermentation

Navigation line extraction algorithm for corn spraying robot based on YOLOv8s-CornNet

An ultra-lightweight method for individual identification of cow-back pattern images in an open image set

意 见 箱

作者其他论文更多>>

意见箱