The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis

Gil, Nelson; Fiser, András

doi:10.1093/bioinformatics/bty523

Cited by 17 publications

(21 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even if we consider the single best prediction out of all the groups for each of the 32 targets, we get an overall average F ‐score of 0.24 (the highest single F ‐score achieved by any group and any target is 0.76 in this set). A recent work estimated that random residue‐based prediction results in an average F ‐score of approximately 0.12, with a very sharp normal distribution. Therefore, although an average F ‐score of 0.24 is certainly statistically significant, it is also clear that there is much room to further improve contact predictions.…”

Section: Discussionmentioning

confidence: 99%

Assessing the accuracy of contact predictions in CASP13

et al. 2019

Self Cite

View full text Add to dashboard Cite

The accuracy of sequence‐based tertiary contact predictions was assessed in a blind prediction experiment at the CASP13 meeting. After 4 years of significant improvements in prediction accuracy, another dramatic advance has taken place since CASP12 was held 2 years ago. The precision of predicting the top L/5 contacts in the free modeling category, where L is the corresponding length of the protein in residues, has exceeded 70%. As a comparison, the best‐performing group at CASP12 with a 47% precision would have finished below the top 1/3 of the CASP13 groups. Extensively trained deep neural network approaches dominate the top performing algorithms, which appear to efficiently integrate information on coevolving residues and interacting fragments or possibly utilize memories of sequence similarities and sometimes can deliver accurate results even in the absence of virtually any target specific evolutionary information. If the current performance is evaluated by F‐score on L contacts, it stands around 24% right now, which, despite the tremendous impact and advance in improving its utility for structure modeling, also suggests that there is much room left for further improvement.

show abstract

Section: Discussionmentioning

confidence: 99%

Assessing the accuracy of contact predictions in CASP13

et al. 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…Sequence‐based interface predictions were performed as follows, adapted from previous studies: A given query protein's amino acid sequence was searched through the NCBI “nr” database using jackhmmer 3.1 with a domain‐based e‐value cutoff of 10 −20 and otherwise default parameters, generating a sequence profile typically including several thousand hits. The jackhmmer profile was then subset into 264 alternative MSAs by combinatorially applying three sequence identity filters: the minimum (set at 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, and 60%) and maximum (set at 50%, 70%, 90%, and 99%) sequence identity between query and hits, and the maximum sequence identity (clustering level) among hits (set at 40%, 50%, 60%, 70%, 80%, 90%, 95%, and 99%). The total number of combinations of all parameters is 288 but the minimum and maximum sequence identities of hits to the query have an overlap in the middle range, which reduces the possible number of combinations to 264.…”

Section: Methodsmentioning

confidence: 99%

“…On the other hand, many machine learning approaches have been developed that combine sequence and structural features to arrive at binding interface predictions . Recent benchmarks suggest that the field of feature‐based binding interface prediction appears to have saturated, as the addition of new properties results in little improvement in performance, and argue that future improvements may be expected from customized predictors that focus on specific classes of proteins …”

Section: Introductionmentioning

confidence: 99%

“…We present a novel structure‐based interface mapping and filtering approach that we used in combination with our recently‐developed multiple sequence alignment (MSA) selection pipeline, selection of alignment by maximal mutual information (SAMMI) . We tested the algorithms on the set of 22 IgSF proteins with currently‐known trans ‐binding interfaces (Table ), and show that sequence‐based predictions can be used to improve upon structure predictions toward the theoretical limit of binding site prediction . The current limiting factor in achieving this is identifying the optimal set of homologs to include in multiple sequence alignments used in conservation analysis.…”

Section: Introductionmentioning

confidence: 99%

“…16 On the other hand, many machine learning approaches have been developed that combine sequence and structural features to arrive at binding interface predictions. 20 Recent benchmarks suggest that the field of feature-based binding interface prediction appears to have saturated, as the addition of new properties results in little improvement in performance, 16,21 and argue that future improvements may be expected from customized predictors that focus on specific classes of proteins. 16 In the current work, we combine structural and sequence information to predict the binding interfaces of IgSF proteins ( Figure S1), taking advantage of their structural homogeneity in the context of significant sequence divergence.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Discovery of receptor‐ligand interfaces in the immunoglobulin superfamily

2019

Self Cite

View full text Add to dashboard Cite

Cell‐surface‐anchored immunoglobulin superfamily (IgSF) proteins are widespread throughout the human proteome, forming crucial components of diverse biological processes including immunity, cell‐cell adhesion, and carcinogenesis. IgSF proteins generally function through protein‐protein interactions carried out between extracellular, membrane‐bound proteins on adjacent cells, known as trans‐binding interfaces. These protein‐protein interactions constitute a class of pharmaceutical targets important in the treatment of autoimmune diseases, chronic infections, and cancer. A molecular‐level understanding of IgSF protein‐protein interactions would greatly benefit further drug development. A critical step toward this goal is the reliable identification of IgSF trans‐binding interfaces. We propose a novel combination of structure and sequence information to identify trans‐binding interfaces in IgSF proteins. We developed a structure‐based binding interface prediction approach that can identify broad regions of the protein surface that encompass the binding interfaces and suggests that IgSF proteins possess binding supersites. These interfaces could theoretically be pinpointed using sequence‐based conservation analysis, with performance approaching the theoretical upper limit of binding interface prediction accuracy, but achieving this in practice is limited by the current ability to identify an appropriate multiple sequence alignment for conservation analysis. However, an important contribution of combining the two orthogonal methods is that agreement between these approaches can estimate the reliability of the predictions. This approach was benchmarked on the set of 22 IgSF proteins with experimentally solved structures in complex with their ligands. Additionally, we provide structure‐based predictions and reliability scores for the 62 IgSF proteins with known structure but yet uncharacterized binding interfaces.

show abstract

Assessing the functional impact of protein binding site definition

Nandigrami,

Fiser

2024

Protein Science

Self Cite

View full text Add to dashboard Cite

Many biomedical applications, such as classification of binding specificities or bioengineering, depend on the accurate definition of protein binding interfaces. Depending on the choice of method used, substantially different sets of residues can be classified as belonging to the interface of a protein. A typical approach used to verify these definitions is to mutate residues and measure the impact of these changes on binding. Besides the lack of exhaustive data, this approach also suffers from the fundamental problem that a mutation introduces an unknown amount of alteration into an interface, which potentially alters the binding characteristics of the interface. In this study we explore the impact of alternative binding site definitions on the ability of a protein to recognize its cognate ligand using a pharmacophore approach, which does not affect the interface. The study also shows that methods for protein binding interface predictions should perform above approximately F‐score = 0.7 accuracy level to capture the biological function of a protein.

show abstract

The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis

Abstract: Supplementary data are available at Bioinformatics online.

Cited by 17 publications

References 49 publications

Assessing the accuracy of contact predictions in CASP13

Assessing the accuracy of contact predictions in CASP13

Discovery of receptor‐ligand interfaces in the immunoglobulin superfamily

Assessing the functional impact of protein binding site definition

Contact Info

Product

Resources

About