您好,欢迎访问北京市农林科学院 机构知识库!

YOLACTFusion: An instance segmentation method for RGB-NIR multimodal image fusion based on an attention mechanism

文献类型: 外文期刊

作者: Liu, Cheng 1 ; Feng, Qingchun 1 ; Sun, Yuhuan 1 ; Li, Yajun 1 ; Ru, Mengfei 1 ; Xu, Lijia 2 ;

作者机构: 1.Beijing Acad Agr & Forestry Sci, Intelligent Equipment Res Ctr, Beijing 100097, Peoples R China

2.Sichuan Agr Univ, Coll Mech & Elect Engn, Yaan 625014, Peoples R China

3.Beijing Key Lab Intelligent Equipment Technol Agr, Beijing 100097, Peoples R China

关键词: Multimodal fusion; Attention mechanism; YOLACT; Tomato main-stem; Multimodal loss function

期刊名称:COMPUTERS AND ELECTRONICS IN AGRICULTURE ( 影响因子:8.3; 五年影响因子:8.3 )

ISSN: 0168-1699

年卷期: 2023 年 213 卷

页码:

收录情况: SCI

摘要: The tomato plant's main-stem is a feasible lead for robotic searching the grows discretely-growing targets of harvesting, pruning or pollinating. Owing to the highlighted reflection characteristics of the main-stem in the near-infrared (NIR) waveband, this study proposes a multimodal hierarchical fusion method (YOLACTFusion) based on the attention mechanism, to achieve an instance segmentation of the main-stem from similar-colored differentiation (i.e., green leaf and green fruit) in robotic vision systems. The model inputs RGB images and 900-1100 nm NIR images into two ResNet50 backbone networks and uses a parallel attention mechanism to fuse feature maps of various scales together into the head network, to improve the segmentation performance of the main-stem of RGB images. The loss function for the multimodal image weights the original loss on the RGB image and the position offset loss and classification loss on the NIR image. Furthermore, the local depthwise separable convolution is used for the backbone network, and Conv-BN layers are merged to reduce the computational complexity. The results show that the precision and recall of YOLACTFusion of the main-stem detection, respectively reached 93.90 % and 62.60 %; and the precision and recall of instance segmentation reached 95.12 % and 63.41 %, respectively. Compared to YOLACT, the mean average precision (mAP) of YOLACTFusion is increased from 39.20 % to 46.29 %, the model size is reduced from 199.03 MB to 165.52 MB, while the image processing efficiency remains similar. The overall results show that the multimodal instance segmentation method proposed in this study significantly improves the detection and segmentation of tomato main-stems under a similar-colored background, which would be a potential method for improving agricultural robot's visual perception.

  • 相关文献
作者其他论文 更多>>