2018
DOI: 10.1093/bioinformatics/bty523
|View full text |Cite
|
Sign up to set email alerts
|

The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis

Abstract: Supplementary data are available at Bioinformatics online.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
21
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5

Relationship

3
2

Authors

Journals

citations
Cited by 17 publications
(21 citation statements)
references
References 49 publications
0
21
0
Order By: Relevance
“…Even if we consider the single best prediction out of all the groups for each of the 32 targets, we get an overall average F ‐score of 0.24 (the highest single F ‐score achieved by any group and any target is 0.76 in this set). A recent work estimated that random residue‐based prediction results in an average F ‐score of approximately 0.12, with a very sharp normal distribution. Therefore, although an average F ‐score of 0.24 is certainly statistically significant, it is also clear that there is much room to further improve contact predictions.…”
Section: Discussionmentioning
confidence: 99%
“…Even if we consider the single best prediction out of all the groups for each of the 32 targets, we get an overall average F ‐score of 0.24 (the highest single F ‐score achieved by any group and any target is 0.76 in this set). A recent work estimated that random residue‐based prediction results in an average F ‐score of approximately 0.12, with a very sharp normal distribution. Therefore, although an average F ‐score of 0.24 is certainly statistically significant, it is also clear that there is much room to further improve contact predictions.…”
Section: Discussionmentioning
confidence: 99%
“…Sequence‐based interface predictions were performed as follows, adapted from previous studies: A given query protein's amino acid sequence was searched through the NCBI “nr” database using jackhmmer 3.1 with a domain‐based e‐value cutoff of 10 −20 and otherwise default parameters, generating a sequence profile typically including several thousand hits. The jackhmmer profile was then subset into 264 alternative MSAs by combinatorially applying three sequence identity filters: the minimum (set at 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, and 60%) and maximum (set at 50%, 70%, 90%, and 99%) sequence identity between query and hits, and the maximum sequence identity (clustering level) among hits (set at 40%, 50%, 60%, 70%, 80%, 90%, 95%, and 99%). The total number of combinations of all parameters is 288 but the minimum and maximum sequence identities of hits to the query have an overlap in the middle range, which reduces the possible number of combinations to 264.…”
Section: Methodsmentioning
confidence: 99%
“…On the other hand, many machine learning approaches have been developed that combine sequence and structural features to arrive at binding interface predictions . Recent benchmarks suggest that the field of feature‐based binding interface prediction appears to have saturated, as the addition of new properties results in little improvement in performance, and argue that future improvements may be expected from customized predictors that focus on specific classes of proteins …”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations