详细信息

pLMFPPred: a novel approach for accurate prediction of functional peptides integrating embedding from pre-trained protein language model and imbalanced learning ( EI收录)

文献类型：期刊文献

英文题名：pLMFPPred: a novel approach for accurate prediction of functional peptides integrating embedding from pre-trained protein language model and imbalanced learning

作者：Ma, Zebin[1]; Zou, Yonglin[1]; Huang, Xiaobin[1]; Yan, Wenjin[2]; Xu, Hao[1]; Yang, Jiexin[1]; Zhang, Ying[1]; Huang, Jinqi[3]

机构：[1] School of Mathematics and Computer Science, Guangdong Ocean University, Guangdong, 524088, China; [2] The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Gansu, 730000, China; [3] Department of Hematology, Affiliated Hospital of Guangdong Medical University, Guangdong, 524000, China

年份：2023

外文期刊名：arXiv

收录：EI(收录号：20230350133)

语种：英文

外文关键词：Computational linguistics - Embeddings - Machine learning - Peptides

外文摘要：Background Functional peptides have the potential to treat a variety of diseases. Their good therapeutic efficacy and low toxicity make them ideal therapeutic agents. Artificial intelligence-based computational strategies can help quickly identify new functional peptides from collections of protein sequences and discover their different functions. Results Using protein language model-based embeddings (ESM-2), we developed a tool called pLMFPPred (Protein Language Model-based Functional Peptide Predictor) for predicting functional peptides and identifying toxic peptides. We also introduced SMOTE-TOMEK data synthesis sampling and Shapley value-based feature selection techniques to relieve data imbalance issues and reduce computational costs. On a validated independent test set, pLMFPPred achieved accuracy, Area under the curve - Receiver Operating Characteristics, and F1-Score values of 0.974, 0.99, and 0.974, respectively. Comparative experiments show that pLMFPPred outperforms current methods for predicting functional peptides. Conclusions The experimental results suggest that the proposed method (pLMFPPred) can provide better performance in terms of Accuracy, Area under the curve - Receiver Operating Characteristics, and F1-Score than existing methods. pLMFPPred has achieved good performance in predicting functional peptides and represents a new computational method for predicting functional peptides. The source code and dataset can be obtained at https://github.com/Mnb66/pLMFPPred. Copyright ? 2023, The Authors. All rights reserved.

参考文献：

正在载入数据...