北京市农林科学院机构知识库

Cournot Policy Model: Rethinking centralized training in multi-agent reinforcement learning

收藏
分享
全文链接

文献类型：外文期刊

作者： Li, Jingchen ¹ ; Yang, Yusen ¹ ; He, Ziming ² ; Wu, Huarui ¹ ; Shi, Haobin ² ; Chen, Wenbai ³ ;

作者机构： 1.Beijing Acad Agr & Forestry Sci, Informat Technol Res Ctr, Beijing 100079, Peoples R China

2.Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Shaanxi, Peoples R China

3.Beijing Informat Sci & Technol Univ, Automat Sch, Beijing 100192, Peoples R China

4.Minist Agr & Rural Affairs, Key Lab Digital Village Technol, Beijing 100079, Peoples R China

关键词： Multi-agent reinforcement learning; Machine learning; Multi-agent system

期刊名称：INFORMATION SCIENCES （影响因子：6.8；五年影响因子：6.6 ）

ISSN： 0020-0255

年卷期： 2024 年 677 卷

页码：

收录情况： SCI

摘要： This work studies Centralized Training and Decentralized Execution (CTDE), which is a powerful mechanism to ease multi -agent reinforcement learning. Although the centralized evaluation ensures unbiased estimates of Q -value, peers with unknown policies make the decentralized policy far from the expectation. To make progress in more stabilized and effective joint policy, we develop a novel game framework, termed Cournot Policy Model, to enhance the CTDE-based multi -agent learning. Combining the game theory and reinforcement learning, we regard the joint decision -making in a single time step as a Cournot duopoly model, and then design a Hetero Variational Auto -Encoder to model the policies of peers in the decentralized execution. With a conditional policy, each agent is guided to a stable mixed -strategy equilibrium even though the joint policy evolves over time. We further demonstrate that such an equilibrium must exist in the case of centralized evaluation. We investigate the improvement of our method on existing centralized learning methods. The experimental results on a comprehensive collection of benchmarks indicate our approach consistently outperforms baseline methods.

相关文献

作者其他论文更多>>

A Large-Scale UAV Swarm Confrontation Method Based on Fuzzy Reinforcement Learning

作者：Hu, Chunyang;Gu, Qiong;Wu, Zhao;Ning, Bin;Li, Jingchen;Yang, Yusen

关键词：Unmanned aerial vehicle; Large-scale multi-agent systems; Multi-agent reinforcement learning
U2Net-MGP: A Lightweight and Efficient Visual Perception Algorithm for Consumer Electronic Accessories

作者：Chen, Wenbai;Zhang, Bo;Zhao, Xin;Wang, Yiqun;Li, Jingchen;Shi, Haobin;Gou, Jianping

关键词：Image segmentation; Consumer electronics; Feature extraction; Assembly; Accuracy; Computational modeling; Decoding; salient object segmentation; ghost convolution; polarized self-attention mechanism; multi-scale feature fusion
A large language model for multimodal identification of crop diseases and pests

作者：Wang, Yiqun;Wang, Fahai;Chen, Wenbai;Lv, Bowen;Liu, Mengchen;Kong, Xiangyuan;Pan, Zhaocen;Zhao, Chunjiang;Wang, Fahai;Lv, Bowen;Liu, Mengchen

关键词：Large language model; Crop disease identification; Agricultural questions and answers; Multimodal
A Joint Knowledge Extraction Model for Tobacco Pest and Disease Prevention Based on BERT plus BA plus CASREL

作者：Liu, Kehan;Zhang, Feng;Wu, Qiulan;Sun, Ziruo;Liu, Kehan;Sun, Xiang;Wu, Huarui;Sun, Ziruo;Zhang, Feng;Wu, Qiulan;Sun, Xiang;Wu, Huarui

关键词：Tobacco pest and disease prevention; knowledge extraction; joint knowledge extraction model; Tobacco pest and disease prevention; knowledge extraction; joint knowledge extraction model
An Improved iTransformer with RevIN and SSA for Greenhouse Soil Temperature Prediction

作者：Wang, Fahai;Wang, Yiqun;Chen, Wenbai;Zhao, Chunjiang

关键词：time-series prediction; iTransformer; singular spectrum analysis; reversible instance normalization; greenhouse control
Swin-Unet plus plus : a study on phenotypic parameter analysis of cabbage seedling roots

作者：Li, Hongda;Zhao, Yue;Bi, Zeyang;Li, Hongda;Hao, Peng;Wu, Huarui;Zhao, Chunjiang;Hao, Peng;Wu, Huarui;Zhao, Chunjiang

关键词：Cabbage; Root phenotype; Attention mechanism; Semantic segmentation; Unet; Residual networks
A high-efficiency regulation method for optimal root zone temperature under different nitrogen fertilizer using discrete curvature

作者：Li, Huimin;Gao, Pan;Sun, Zhangtong;Hu, Jin;Wei, Ziyuan;Lu, Miao;Li, Huimin;Wei, Ziyuan;Lu, Miao;Gao, Pan;Sun, Zhangtong;Hu, Jin;Gao, Pan;Wu, Huarui

关键词：U -chord curvature; Chlorophyll fluorescence; Suitable RZT range; Dynamic regulation; Hydroponic tomato seedlings

Cournot Policy Model: Rethinking centralized training in multi-agent reinforcement learning

作者其他论文 更多>>

A Large-Scale UAV Swarm Confrontation Method Based on Fuzzy Reinforcement Learning

U2Net-MGP: A Lightweight and Efficient Visual Perception Algorithm for Consumer Electronic Accessories

A large language model for multimodal identification of crop diseases and pests

A Joint Knowledge Extraction Model for Tobacco Pest and Disease Prevention Based on BERT plus BA plus CASREL

An Improved iTransformer with RevIN and SSA for Greenhouse Soil Temperature Prediction

Swin-Unet plus plus : a study on phenotypic parameter analysis of cabbage seedling roots

A high-efficiency regulation method for optimal root zone temperature under different nitrogen fertilizer using discrete curvature

意 见 箱

作者其他论文更多>>

意见箱