Yuanfei Sun scite author profile

We present the results for CAPRI Round 50, the fourth joint CASP-CAPRI protein assembly prediction challenge. The Round comprised a total of twelve targets, including six dimers, three trimers, and three higher-order oligomers. Four of these were easy targets, for which good structural templates were available either for the full assembly, or for the main interfaces (of the higher-order oligomers). Eight were

show abstract

Assessment of blind predictions of the clinical significance of BRCA1 and BRCA2 variants

Cline

Babbi

Bonache³

et al. 2019

Human Mutation

View full text Add to dashboard Cite

Testing for variation in BRCA1 and BRCA2 (commonly referred to as BRCA1/2), has emerged as a standard clinical practice and is helping countless women better understand and manage their heritable risk of breast and ovarian cancer. Yet the increased rate of BRCA1/2 testing has led to an increasing number of Variants of Uncertain Significance (VUS), and the rate of VUS discovery currently outpaces the rate of clinical variant interpretation. Computational prediction is a key component of the variant interpretation pipeline. In the CAGI5 ENIGMA Challenge, six prediction teams submitted predictions on 326 newly‐interpreted variants from the ENIGMA Consortium. By evaluating these predictions against the new interpretations, we have gained a number of insights on the state of the art of variant prediction and specific steps to further advance this state of the art.

show abstract

Predicting protein conformational changes for unbound and homology docking: learning from intrinsic and induced flexibility

2016

View full text Add to dashboard Cite

Predicting protein conformational changes from unbound structures or even homology models to bound structures remains a critical challenge for protein docking. Here we present a study directly addressing the challenge by reducing the dimensionality and narrowing the range of the corresponding conformational space. The study builds on cNMA-our new framework of partner- and contact-specific normal mode analysis that exploits encounter complexes and considers both intrinsic and induced flexibility. First, we established over a CAPRI (Critical Assessment of PRedicted Interactions) target set that the direction of conformational changes from unbound structures and homology models can be reproduced to a great extent by a small set of cNMA modes. In particular, homology-to-bound interface root-mean-square deviation (iRMSD) can be reduced by 40% on average with the slowest 30 modes. Second, we developed novel and interpretable features from cNMA and used various machine learning approaches to predict the extent of conformational changes. The models learned from a set of unbound-to-bound conformational changes could predict the actual extent of iRMSD with errors around 0.6 Å for unbound proteins in a held-out benchmark subset, around 0.8 Å for unbound proteins in the CAPRI set, and around 1 Å even for homology models in the CAPRI set. Our results shed new insights into origins of conformational differences between homology models and bound structures and provide new support for the low-dimensionality of conformational adjustment during protein associations. The results also provide new tools for ensemble generation and conformational sampling in unbound and homology docking. Proteins 2017; 85:544-556. © 2016 Wiley Periodicals, Inc.

show abstract

The ENCODE Imputation Challenge: A critical assessment of methods for cross-cell type imputation of epigenomic profiles

Schreiber

Boix

Lee

et al. 2022

Preprint

View full text Add to dashboard Cite

Functional genomics experiments are invaluable for understanding mechanisms of gene regulation. However, comprehensively performing all such experiments, even across a fixed set of sample and assay types, is often infeasible in practice. A promising alternative to performing experiments exhaustively is to, instead, perform a core set of experiments and subsequently use machine learning methods to impute the remaining experiments. However, questions remain as to the quality of the imputations, the best approaches for performing imputations, and even what performance measures meaningfully evaluate performance of such models. In this work, we address these questions by comprehensively analyzing imputations from 23 imputation models submitted to the ENCODE Imputation Challenge. We find that measuring the quality of imputations is significantly more challenging than reported in the literature, and is confounded by three factors: major distributional shifts that arise because of differences in data collection and processing over time, the amount of available data per cell type, and redundancy among performance measures. Our systematic analyses suggest several steps that are necessary, but also simple, for fairly evaluating the performance of such models, as well as promising directions for more robust research in this area.

show abstract

Assessing the performance of in silico methods for predicting the pathogenicity of variants in the gene CHEK2, among Hispanic females with breast cancer

et al. 2019

View full text Add to dashboard Cite

The availability of disease‐specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI‐5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV‐disease relationships.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yuanfei Sun

Prediction of protein assemblies, the next frontier: The CASP14‐CAPRI experiment

Assessment of blind predictions of the clinical significance of BRCA1 and BRCA2 variants

Predicting protein conformational changes for unbound and homology docking: learning from intrinsic and induced flexibility

The ENCODE Imputation Challenge: A critical assessment of methods for cross-cell type imputation of epigenomic profiles

Assessing the performance of in silico methods for predicting the pathogenicity of variants in the gene CHEK2, among Hispanic females with breast cancer

Contact Info

Product

Resources

About