2017
DOI: 10.1126/science.aah4043
|View full text |Cite
|
Sign up to set email alerts
|

Protein structure determination using metagenome sequence data

Abstract: Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families, and that metagenome sequence data more than triples the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact based st… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

7
574
2
3

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 492 publications
(594 citation statements)
references
References 57 publications
7
574
2
3
Order By: Relevance
“…Although application of these free energies requires the use of a structure, this is changing because of recent advances in predicting contact pairs from sequence correlations (66,67). Improved next-generation sequencing technologies are leading to the rapid growth in protein sequence data so that these sequence-based predictions of contacts may eventually overwhelm structure determination studies.…”
Section: Discussionmentioning
confidence: 99%
“…Although application of these free energies requires the use of a structure, this is changing because of recent advances in predicting contact pairs from sequence correlations (66,67). Improved next-generation sequencing technologies are leading to the rapid growth in protein sequence data so that these sequence-based predictions of contacts may eventually overwhelm structure determination studies.…”
Section: Discussionmentioning
confidence: 99%
“…By comparing residue-residue coevolution strengths computed from an alignment of Cyc2-like proteins with the Gremlin server, 7,8 it appears that the predicted contacts match best with the contacts present in model #3 (Fig. 2B and different cutoffs of the coevolution data in Fig.…”
Section: Resultsmentioning
confidence: 99%
“…Ovchinnikov and co-workers have used Big Data techniques to better predict 3D protein structures. [28] Big Data tools are also valuable in chemical toxicology, where the use of high-throughput screening produces both structured and unstructured information that is so large and complex that it is difficult to analyze using traditional methods. [29] Just managing the chemistry data (in terms of chemical compounds and challenges of chemical structure detail vs. the myriad associated identifiers) is enough of a problem.…”
Section: Unitmentioning
confidence: 99%