2019
DOI: 10.1103/physreve.99.032405
|View full text |Cite
|
Sign up to set email alerts
|

Influence of multiple-sequence-alignment depth on Potts statistical models of protein covariation

Abstract: Potts statistical models have become a popular and promising way to analyze mutational covariation in protein multiple sequence alignments (MSAs) in order to understand protein structure, function and fitness. But the statistical limitations of these models, which can have millions of parameters and are fit to MSAs of only thousands or hundreds of effective sequences using a procedure known as inverse Ising inference, are incompletely understood. In this work we predict how model quality degrades as a function… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

5
39
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6

Relationship

3
3

Authors

Journals

citations
Cited by 27 publications
(44 citation statements)
references
References 46 publications
(115 reference statements)
5
39
0
Order By: Relevance
“…The methodology followed in this analysis is similar to the one followed in Flynn et al (2017) for HIV-1 PR (for further details on derivation and description of the model parameters see their Materials and methods section as well as the SI). The sample size of the MSA plays a critical role in determining the quality and effectiveness of the model (Haldane and Levy, 2019) and we confirm that the models are fit using sufficient data with minimal overfitting.…”
Section: Methodssupporting
confidence: 63%
See 4 more Smart Citations
“…The methodology followed in this analysis is similar to the one followed in Flynn et al (2017) for HIV-1 PR (for further details on derivation and description of the model parameters see their Materials and methods section as well as the SI). The sample size of the MSA plays a critical role in determining the quality and effectiveness of the model (Haldane and Levy, 2019) and we confirm that the models are fit using sufficient data with minimal overfitting.…”
Section: Methodssupporting
confidence: 63%
“…However, the different types of predictions based on Potts models are differently affected by finite sampling error; predictions of the effect of point mutations to a sequence, ΔE, which forms the basis of this study are the most robust and least affected by finite sampling. Using the in silico tests suggested in Haldane and Levy (2019), we find that the effects of point mutations are accurately captured even in the IN model, which is the most susceptible to finite sampling errors among our three models (for a more detailed analysis of the effects of finite sampling on the predictions of the Potts model, we refer the reader to Haldane and Levy, 2019). Thus, we conclude that the MSA sample sizes for PR, RT, and IN used in this study are sufficiently large to construct Potts models for these HIV proteins that adequately reflect the effects of the sequence background on point mutations which are the central focus of this work.…”
Section: Methodsmentioning
confidence: 88%
See 3 more Smart Citations