A hybrid machine learning model with attention mechanism and multidimensional multivariate feature coding for essential gene prediction

文献类型: 外文期刊

第一作者: Wu, Yan

作者: Wu, Yan;Li, Tan;Li, Mengshan;Xie, Xiaojun;Wu, Yan;Zhou, Weihong;Sheng, Sheng;Wang, Jun;Wu, Fu-an;Wu, Yan;Zhou, Weihong;Sheng, Sheng;Wang, Jun;Wu, Fu-an;Fu, Yu;Li, Mengshan

作者机构:

关键词: Essential gene; Machine learning; Attention mechanism; LSTM; CNN; Feature coding

期刊名称:BMC BIOLOGY ( 影响因子:4.5; 五年影响因子:5.4 )

ISSN:

年卷期: 2025 年 23 卷 1 期

页码:

收录情况: SCI

摘要: BackgroundEssential genes are crucial for the development, inheritance, and survival of species. The exploration of these genes can unravel the complex mechanisms and fundamental life processes and identify potential therapeutic targets for various diseases. Therefore, the identification of essential genes is significant. Machine learning has become the mainstream approach for essential gene prediction. However, some key challenges in machine learning need to be addressed, such as the extraction of genetic features, the impact of imbalanced data, and the cross-species generalization ability.ResultsHere, we proposed a hybrid machine learning model based on graph convolutional neural networks (GCN) and bi-directional long short-term memory (Bi-LSTM) with attention mechanism and multidimensional multivariate feature coding for essential gene prediction, called EGP Hybrid-ML. In the model, GCN was used to extract feature encoding information from the visualized graphics of gene sequences and the attention mechanism was combined with Bi-LSTM to assess the importance of each feature in gene sequences and analyze the influences of different feature encoding methods and data imbalance. Additionally, the cross-species predictive performance of the model was evaluated through cross-validation. The results indicated that the sensitivity of the EGP Hybrid-ML model reached 0.9122.ConclusionsThis model demonstrated the superior predictive performance and strong generalization capabilities compared to other models. The EGP Hybrid-ML model proposed in this paper has broad application prospects in bioinformatics, chemical information, and pharmaceutical information. The codes, architectures, parameters, and datasets of the proposed model are available free of charge at GitHub (https://github.com/gnnumsli/EGP-Hybrid-ML).

分类号:

  • 相关文献
作者其他论文 更多>>