Integrative denoising and feature extraction method (D-FE) for improving low-quality Raman data

文献类型: 外文期刊

第一作者: Wang, Chunjie

作者: Wang, Chunjie;Zhao, Xiaoyu;Zhao, Yue;Cai, Lijing;Tong, Liang;Wang, Baicheng

作者机构:

关键词: Raman spectroscopy; SHapley Additive exPlanations; Density functional theory; Preprocessing; Feature extraction

期刊名称:MICROCHEMICAL JOURNAL ( 影响因子:5.1; 五年影响因子:4.7 )

ISSN: 0026-265X

年卷期: 2025 年 210 卷

页码:

收录情况: SCI

摘要: In the realm of online water measurement for paddy fields, a novel Integrative Denoising and Feature Extraction (D-FE) method has been proposed to overcome modeling challenges caused by strong interference from sources such as light, electricity, and mechanical factors, which degrade data quality. D-FE method focuses on utilizing Raman fingerprint features as key indicators, initially evaluating noise and Raman signals by calculating the coefficient of variation between candidate and actual key features to filter out suspected noise. It then further examines the importance of the remaining features, constructs a regression model based on their significance, and identifies critical features through model evaluation feedback, completing data preprocessing and feature extraction. In this study, an XG-Boost measurement model for dissolved oxygen (DO) and pH levels was developed using the D-FE method and optimized with the Optuna algorithm. Experimental results demonstrate that the D-FE + Optuna + XG-Boost model (boasting R2p of 0.9305 and 0.9025; RMSEP of 0.6060 and 0.2488; and RPD of 4.4325 and 3.2023) consistently outperforms various classical algorithm combinations (preprocessing: SG 1st derivative, SNV, MSC, Baseline, S-G 2nd derivative, airPLS; feature extraction: SPA, CARS, UVE, IRF; regression methods: PLSR, PCR, BP, RBF). Even during simulated tests involving environmental temperature fluctuations, sunlight intensity variations, light source attenuation due to power consumption, and optical component degradation from extended use, the proposed D-FE + Optuna + XG-Boost model maintained the highest decision coefficient, lowest prediction root mean square error, and minimal fluctuations. This research expands the use of Raman feature peaks and introduces the integrative D-FE method. Unlike classical modeling approaches, which often require extensive optimization combinations for data preprocessing and feature extraction, the D-FE method achieves optimal denoising and feature extraction with less effort. By developing an Optuna + XG-Boost model based on D-FE, this research facilitates online, pollution-free, and rapid assessment of DO and pH values in paddy field water, offering technical support for synchronized measurement, fertilization, and sowing.

分类号:

  • 相关文献
作者其他论文 更多>>