MAPoseNet: Animal pose estimation network via multi-scale convolutional attention

文献类型: 外文期刊

第一作者: Liu, Sicong

作者: Liu, Sicong;Fan, Qingcheng;Li, Shuqin;Zhao, Chunjiang;Zhao, Chunjiang

作者机构:

关键词: Animal pose estimation; Attention mechanism; Asymmetric convolution; Feature pyramid

期刊名称:JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION ( 影响因子:2.6; 五年影响因子:2.8 )

ISSN: 1047-3203

年卷期: 2023 年 97 卷

页码:

收录情况: SCI

摘要: Animal pose estimation serves as an upstream task for recognizing and understanding animal behavior. Over the last year, the accuracy of the deep learning-based method has steadily improved, but at the expense of the model's inference speed. This paper uses an efficient and powerful model to improve inference speed and accuracy. The classic encoder-decoder architecture is chosen. For estimating animal pose, our model based on a feature pyramid and a multi-scale asymmetric convolution attention mechanism is developed and named MAPoseNet (Animal Pose Estimation Network Via Multi-scale Convolutional Attention). MAPoseNet consists of an encoder and a decoder. Rather than typical self-attention, the encoder's attention mechanism comprises multi-scale, asymmetric convolutions that are lightweight and instrumental in improving inference speed. A feature pyramid and a feature balance module make up the decoder. The public dataset AP-10K is used to train and test MAPoseNet. A series of experimental results demonstrate that the MAPoseNet model provides cutting-edge performance. MAPoseNet outperforms HRFormer by 1.3 AP and 0.8 AR, with 33.7% fewer FLOPs and 66% faster inference speed. And our model surpasses HRNet and HRFormer on the Animal Pose dataset as well. Our model has achieved a win-win situation regarding inference speed and accuracy.

分类号:

  • 相关文献
作者其他论文 更多>>