2022
DOI: 10.1093/bioinformatics/btac678
|View full text |Cite
|
Sign up to set email alerts
|

E-SNPs&GO: embedding of protein sequence and function improves the annotation of human pathogenic variants

Abstract: Motivation The advent of massive DNA sequencing technologies is producing a huge number of human single-nucleotide polymorphisms occurring in protein-coding regions and possibly changing their sequences. Discriminating harmful protein variations from neutral ones is one of the crucial challenges in precision medicine. Computational tools based on artificial intelligence provide models for protein sequence encoding, bypassing database searches for evolutionary information. We leverage the new … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
11
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 15 publications
(13 citation statements)
references
References 51 publications
(110 reference statements)
2
11
0
Order By: Relevance
“…We applied the trained two-stage SAV prediction model TransEFVP to the blind test set, and the results are listed in Table . At the same time, we list the most advanced SAV prediction tools, including E-SNPs&GO (one of the current state-of-the-art models dedicated to SAV prediction), PROVEAN, and MutPred2 . All of the methods used the same test set for a fair performance comparison.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…We applied the trained two-stage SAV prediction model TransEFVP to the blind test set, and the results are listed in Table . At the same time, we list the most advanced SAV prediction tools, including E-SNPs&GO (one of the current state-of-the-art models dedicated to SAV prediction), PROVEAN, and MutPred2 . All of the methods used the same test set for a fair performance comparison.…”
Section: Resultsmentioning
confidence: 99%
“…And we only keep the SAVs that are obviously related to the diseases listed in OMIM and MONDO . Both databases classify SAVs into the following classes: pathogenic or likely pathogenic (P/LP), benign or likely benign (B/LB), and uncertain significance (US) . Overall, the data set contains 111,412 SAVs in 13 661 protein sequences, including 43 895 P/LP SAVs in 3603 proteins and 67 517 B/LB SAVs in 13 229 proteins.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Many of the early methods were based on the prediction of the effect of a single mutation on the protein thermodynamic stability, as destabilization is one of the key factors in pathogenesis ( Capriotti et al, 2008 ; Dehouck et al, 2011 ; Worth et al, 2011 ; Fariselli et al, 2015 ; Laimer et al, 2015 ; Quan et al, 2016 ; Savojardo et al, 2016 ; Yang et al, 2018 ; Marabotti et al, 2020 ; Pires et al, 2020 ; Montanucci et al, 2022 ). Subsequent efforts and developments in the field produced last-generation methods, using one of three general strategies: i) prediction of the likelihood of a missense mutation for causing pathogenic changes in a protein ( Sim et al, 2012 ; Adzhubei et al, 2013 ; Carter et al, 2013 ; Katsonis et al, 2014 ; Niroula et al, 2015 ; Capriotti et al, 2017 ; Raimondi et al, 2017 ; Rentzsch et al, 2019 ; Pejaver et al, 2020 ; Manfredi et al, 2022 ; Quinodoz et al, 2022 ); ii) evolutionary conservation analysis of the mutated sites; iii) methods combining different strategies ( Stein et al, 2019 ; Petrosino et al, 2021 ). More recently, several methods have been developed to also predict the impact of variants in noncoding regions ( Rojano et al, 2019 ; Katsonis et al, 2022 ; Tabarini et al, 2022 ).…”
Section: Tools For Rare Disease Genome Interpretationmentioning
confidence: 99%
“…Felix et al utilized protein language models and conditional random fields to accurately predict various signal peptides, resulting in state-of-the-art results [21]. Moreover, the combinations of representations generated by self-supervised methods have been demonstrated to be effective in predicting protein structure and function [22][23]. However, there is still much to explore in terms of combining embedded representations generated by different types of self-supervised learning networks and comprehensively validating their performance.…”
Section: Introductionmentioning
confidence: 99%