您好,欢迎访问北京市农林科学院 机构知识库!

A Comprehensive Evaluation of Monocular Depth Estimation Methods in Low-Altitude Forest Environment

文献类型: 外文期刊

作者: Jia, Jiwen 1 ; Kang, Junhua 1 ; Chen, Lin 2 ; Gao, Xiang 1 ; Zhang, Borui 1 ; Yang, Guijun 1 ;

作者机构: 1.Changan Univ, Coll Geol Engn & Geomat, Xian 710054, Peoples R China

2.VISCODA GmbH, Schneiderberg 32, D-30167 Hannover, Germany

3.Beijing Acad Agr & Forestry Sci, Informat Technol Res Ctr, Key Lab Quantitat Remote Sensing Agr, Minist Agr & Rural Affairs, Beijing 100097, Peoples R China

关键词: monocular depth estimation; CNN; vision transformer; forest environment; comparative study

期刊名称:REMOTE SENSING ( 影响因子:4.1; 五年影响因子:4.8 )

ISSN:

年卷期: 2025 年 17 卷 4 期

页码:

收录情况: SCI

摘要: Monocular depth estimation (MDE) is a critical computer vision task that enhances environmental perception in fields such as autonomous driving and robot navigation. In recent years, deep learning-based MDE methods have achieved notable progress in these fields. However, achieving robust monocular depth estimation in low-altitude forest environments remains challenging, particularly in scenes with dense and cluttered foliage, which complicates applications in environmental monitoring, agriculture, and search and rescue operations. This paper presents a comprehensive evaluation of state-of-the-art deep learning-based MDE methods on low-altitude forest datasets. The evaluated models include both self-supervised and supervised approaches, employing different network structures such as convolutional neural networks (CNNs) and Vision Transformers (ViTs). We assessed the generalization of these approaches across diverse low-altitude scenarios, specifically focusing on forested environments. A systematic set of evaluation criteria is employed, comprising traditional image-based global statistical metrics as well as geometry-aware metrics, to provide a more comprehensive evaluation of depth estimation performance. The results indicate that most Transformer-based models, such as DepthAnything and Metric3D, outperform traditional CNN-based models in complex forest environments by capturing detailed tree structures and depth discontinuities. Conversely, CNN-based models like MiDas and Adabins struggle with handling depth discontinuities and complex occlusions, yielding less detailed predictions. On the Mid-Air dataset, the Transformer-based DepthAnything demonstrates a 54.2% improvement in RMSE for the global error metric compared to the CNN-based Adabins. On the LOBDM dataset, the CNN-based MiDas has the depth edge completeness error of 93.361, while the Transformer-based Metric3D demonstrates the significantly lower error of only 5.494. These findings highlight the potential of Transformer-based approaches for monocular depth estimation in low-altitude forest environments, with implications for high-throughput plant phenotyping, environmental monitoring, and other forest-specific applications.

  • 相关文献
作者其他论文 更多>>