DepthFormer: A High-Resolution Depth-Wise Transformer for Animal Pose Estimation

文献类型: 外文期刊

第一作者: Liu, Sicong

作者: Liu, Sicong;Fan, Qingcheng;Liu, Shanghao;Zhao, Chunjiang;Zhao, Chunjiang

作者机构:

关键词: animal pose estimation; depthformer; multi-resolution representations; depthwise convolution

期刊名称:AGRICULTURE-BASEL ( 影响因子:3.408; 五年影响因子:3.459 )

ISSN:

年卷期: 2022 年 12 卷 8 期

页码:

收录情况: SCI

摘要: Animal pose estimation has important value in both theoretical research and practical applications, such as zoology and wildlife conservation. A simple but effective high-resolution Transformer model for animal pose estimation called DepthFormer is provided in this study to address the issue of large-scale models for multi-animal pose estimation being problematic with limited computing resources. We make good use of a multi-branch parallel design that can maintain high-resolution representations throughout the process. Along with two similarities, i.e., sparse connectivity and weight sharing between self-attention and depthwise convolution, we utilize the delicate structure of the Transformer and representative batch normalization to design a new basic block for reducing the number of parameters and the amount of computation required. In addition, four PoolFormer blocks are introduced after the parallel network to maintain good performance. Benchmark evaluation is performed on a public database named AP-10K, which contains 23 animal families and 54 species, and the results are compared with the other six state-of-the-art pose estimation networks. The results demonstrate that the performance of DepthFormer surpasses that of other popular lightweight networks (e.g., Lite-HRNet and HRFormer-Tiny) when performing this task. This work can provide effective technical support to accurately estimate animal poses with limited computing resources.

分类号:

  • 相关文献
作者其他论文 更多>>