2021
DOI: 10.7554/elife.60220
|View full text |Cite
|
Sign up to set email alerts
|

Identifying molecular features that are associated with biological function of intrinsically disordered protein regions

Abstract: In previous work, we showed that intrinsically disordered regions (IDRs) of proteins contain sequence-distributed molecular features that are conserved over evolution, despite little sequence similarity that can be detected in alignments (Zarin et al. 2019). Here, we aim to use these molecular features to predict specific biological functions for individual IDRs and identify the molecular features within them that are associated with these functions. We find that the predictable functions are diverse. Examinin… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

3
81
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 76 publications
(84 citation statements)
references
References 67 publications
3
81
0
Order By: Relevance
“…The first possibility would reflect a differential “folding propensity” that is inherently encoded in the amino-acid sequences of high vs. low pLDDT-scoring IDRs, whereas the latter two possibilities would influence the AlphaFold2 prediction confidence due to the depth of the MSAs (2) or sequence similarity to the structures from the PDB used in training (3) (Jumper et al 2021a,b). Given the relatively poor coverage of IDRs in the PDB (Quaglia et al 2021) and the poor positional alignability for most IDRs (Colak et al 2013; Nguyen Ba et al 2012; Zarin et al 2019, 2021), it is plausible that some combination of all three of the aforementioned possibilities could contribute to high pLDDT scoring IDRs.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The first possibility would reflect a differential “folding propensity” that is inherently encoded in the amino-acid sequences of high vs. low pLDDT-scoring IDRs, whereas the latter two possibilities would influence the AlphaFold2 prediction confidence due to the depth of the MSAs (2) or sequence similarity to the structures from the PDB used in training (3) (Jumper et al 2021a,b). Given the relatively poor coverage of IDRs in the PDB (Quaglia et al 2021) and the poor positional alignability for most IDRs (Colak et al 2013; Nguyen Ba et al 2012; Zarin et al 2019, 2021), it is plausible that some combination of all three of the aforementioned possibilities could contribute to high pLDDT scoring IDRs.…”
Section: Resultsmentioning
confidence: 99%
“…While the structural predictions generated by AlphaFold2 will certainly accelerate the pace of biomedical discovery, there remains a huge need for experimental (Bhowmick et al 2016) and bioinformatic (Zarin et al 2019, 2021) approaches to address the majority of IDRs that likely function in the absence of folded structure. With increased experimental data on IDRs/IDPs, including integrative structural modelling (Bottaro et al 2020; Choy & Forman-Kay 2001; Gomes et al 2020; Krzeminski et al 2013; Lincoff et al 2020; Ozenne et al 2012; Salmon et al 2010), machine-learning methods promise to provide new insights into disordered protein conformational states and functional mechanisms (Lindorff-Larsen & Kragelund 2021).…”
Section: Discussionmentioning
confidence: 99%
“…Based on prior work, we anticipated that conservation in IDRs could be considered in terms of compositional and linear sequence conservation ( 5, 16, 18, 58 ). Using this framework we can identify proteins that are well conserved in terms of linear sequence (and hence composition), by composition alone, or by neither ( Fig.…”
Section: Resultsmentioning
confidence: 99%
“…For example, we compared the predictive power (Figure 3C) of these representations to predict two highly-specific IDR functions in yeast, mitochondrial targeting signals and direct Cdc28 phosphorylation [13]. On both tasks we find that Unirep [52] performs best (Supplementary Table 2 in Supplementary File 4) achieving 5-fold cross-validation AUC of 0.9 on both tasks.…”
Section: Resultsmentioning
confidence: 99%
“…In contrast, the features for Cdc28 targets are mostly motifs, with the top ranked feature matching the specificity of a proline-directed kinase (such as Cdc28) and the average pool feature is rich in serines and prolines, consistent with known multisite phosphorylation in these substrates [20]. Thus, predictive models based on reverse homology features appear as interpretable as those based on knowledge based features [13], in contrast to features obtained from language models [50], [52].…”
Section: Resultsmentioning
confidence: 99%