2021
DOI: 10.1101/2021.07.29.454330
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning

Abstract: A major challenge to the characterization of intrinsically disordered regions (IDRs), which are widespread in the proteome, but relatively poorly understood, is the identification of molecular features, such as short motifs, amino acid repeats and physicochemical properties that mediate the functions of these regions. Here, we introduce a proteome-scale feature discovery method for IDRs. Our method, which we call "reverse homology", exploits the principle that important functional features are conserved over e… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 134 publications
1
9
0
Order By: Relevance
“…With this in mind, a general model for IDR evolution has emerged in which substantial sequence variation can be tolerated assuming SLiMs are conserved or the amino acid composition and patterning (i.e. bulk sequence properties) are maintained (5,(23)(24)(25).…”
Section: Main Textmentioning
confidence: 99%
“…With this in mind, a general model for IDR evolution has emerged in which substantial sequence variation can be tolerated assuming SLiMs are conserved or the amino acid composition and patterning (i.e. bulk sequence properties) are maintained (5,(23)(24)(25).…”
Section: Main Textmentioning
confidence: 99%
“…47 Moreover, the "spacers" between the stickers are also essential factors in driving LLPS, 47,48 and current sequence alignment algorithms for IDRs cannot easily pick the importance of spacers as well. Although a recent work using machine learning approaches to collect the "features" of IDRs might be an alternative approach to find their traits, 49 it is still limited in capturing the distal interaction, such as the prevailing aromatic residues.…”
mentioning
confidence: 99%
“…In previous work, we explained the theoretical principles behind using evolutionary homology as a basis for contrastive learning [ 57 ]: our method is expected to learn conserved features of protein sequences, which we argue are likely important for the conserved function of rapidly-diverging IDRs ( S1 Methods ).…”
Section: Resultsmentioning
confidence: 99%
“…From a technical perspective, reverse homology employs a self-supervised approach, as many emerging representation learning approaches for protein sequences do [ 50 54 ]. Unlike these methods, which are mostly based on methods adapted from natural language processing, we proposed a novel proxy task that purposes principles of evolutionary proteomics as a learning signal instead [ 57 ]. Another distinction in our study is that previous approaches primarily focus on representation learning, with the aim of optimizing the performance of the representation on downstream regression tasks reflecting protein design or classification problems.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation