2021
DOI: 10.1101/2021.06.09.447770
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SignalP 6.0 achieves signal peptide prediction across all types using protein language models

Abstract: Signal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. As experimental characterization of SPs is costly, prediction algorithms are applied to predict them from sequence data. However, existing methods are unable to detect all known types of SPs. We introduce SignalP 6.0, the first model capable of detecting all five SP types. Additionally, the model accurately identifies the positions of regions within SPs, revealing the defining biochemi… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 26 publications
(25 citation statements)
references
References 35 publications
0
22
0
Order By: Relevance
“…Presence of the signal peptides in the analyzed proteins was assessed with the SignalP version 5.0 and further confirmed with newly released SignalP 6.0 [100,101].…”
Section: Bioinformatic Analysismentioning
confidence: 99%
“…Presence of the signal peptides in the analyzed proteins was assessed with the SignalP version 5.0 and further confirmed with newly released SignalP 6.0 [100,101].…”
Section: Bioinformatic Analysismentioning
confidence: 99%
“…Pretrained transformer protein MLMs contain structural information [Rao et al, 2019, Rives et al, 2021, Chowdhury et al, 2021], encode evolutionary trajectories [Hie et al, 2022a, 2021], are zero-shot predictors of mutation fitness effects [Meier et al, 2021], improve out-of-domain generalization on protein engineering datasets [Dallago et al, 2021], and suggest improved sequences for engineering [Hie et al, 2022b]. Protein MLMs are now incorporated into the latest machine-learning methods for detecting signal peptides [Teufel et al, 2021] and predicting intracellular localization[Thumuluri et al, 2022].…”
Section: Introductionmentioning
confidence: 99%
“…Protein domains and families were identified using pfam_scan.pl (v1.6; http://ftp.ebi.ac.uk/pub/databases/Pfam/Tools/) on Pfam database (release 34.0; Mistry et al, 2021). Additional annotations were carried out as follows: carbohydrates active enzymes were determined using dbCAN2 (v2.0.11; Zhang et al, 2018); secondary metabolites detection via antiSMASH (v6.0; Blin et al, 2021); fungal effectors were predicted using EffectorP (v3.0; Sperschneider & Dodds, 2022) on amino acid sequences which passed the signal peptide prediction via signalP (v6.0; Teufel et al, 2021); and genes related to pathogen-host interaction were determined via PHI-base (v4.11; Urban et al, 2020).…”
Section: Methodsmentioning
confidence: 99%
“…Detailed information such as sequencing platforms, library kits used and sequence accession number for each sample can be found in Pfam database (release 34.0; Mistry et al, 2021). Additional annotations were carried out as follows: carbohydrates active enzymes were determined using dbCAN2 (v2.0.11; ; secondary metabolites detection via antiSMASH (v6.0; Blin et al, 2021); fungal effectors were predicted using EffectorP (v3.0; Sperschneider & Dodds, 2022) on amino acid sequences which passed the signal peptide prediction via signalP (v6.0; Teufel et al, 2021);…”
Section: Nucleic Acids Isolation Genome and Transcriptome Sequencingmentioning
confidence: 99%