Kevin Yao scite author profile

Key challenges for human genetics, precision medicine and evolutionary biology include deciphering the regulatory code of gene expression and understanding the transcriptional effects of genome variation. However, this is extremely difficult because of the enormous scale of the noncoding mutation space. We developed a deep learning-based framework, ExPecto, that can accurately predict, ab initio from a DNA sequence, the tissue-specific transcriptional effects of mutations, including those that are rare or that have not been observed. We prioritized causal variants within disease- or trait-associated loci from all publicly available genome-wide association studies and experimentally validated predictions for four immune-related diseases. By exploiting the scalability of ExPecto, we characterized the regulatory mutation space for human RNA polymerase II-transcribed genes by in silico saturation mutagenesis and profiled > 140 million promoter-proximal mutations. This enables probing of evolutionary constraints on gene expression and ab initio prediction of mutation disease effects, making ExPecto an end-to-end computational framework for the in silico prediction of expression and disease risk.

show abstract

Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk

Zhou

et al. 2019

View full text Add to dashboard Cite

We address the challenge of detecting the contribution of noncoding mutations to disease with a deep-learning-based framework that predicts specific regulatory effects and the deleterious impact of genetic variants. Applying this framework to 1,790 Autism Spectrum Disorder (ASD) simplex families reveals disease causality of noncoding mutations: ASD probands harbor both transcriptional and post-transcriptional regulation-disrupting de novo mutations of significantly higher functional impact than unaffected siblings. Further analysis suggests involvement of noncoding mutations in synaptic transmission and neuronal development, and taken together with prior studies reveal a convergent genetic landscape of coding and noncoding mutations in ASD. We demonstrate that sequences carrying prioritized proband mutations possess allele-specific regulatory activity, and highlight a link between noncoding mutations and IQ heterogeneity in ASD probands. Our predictive genomics framework illuminates the role of noncoding mutations in ASD, prioritizes high impact mutations for further study, and is broadly applicable to complex human diseases.

show abstract

Evaluation of header metadata extraction approaches and tools for scientific PDF documents

Lipinski

Yao

Breitinger

et al. 2013

View full text Add to dashboard Cite

This paper evaluates the performance of tools for the extraction of metadata from scientific articles. Accurate metadata extraction is an important task for automating the management of digital libraries. This comparative study is a guide for developers looking to integrate the most suitable and effective metadata extraction tool into their software. We shed light on the strengths and weaknesses of seven tools in common use. In our evaluation using papers from the arXiv collection, GROBID delivered the best results, followed by Mendeley Desktop. SciPlore Xtract, PDFMeat, and SVMHeaderParse also delivered good results depending on the metadata type to be extracted.

show abstract

Rapid ambient degradation of monolayer MoS₂ after heating in air

et al. 2019

View full text Add to dashboard Cite

We report that heating chemical vapor deposition grown monolayer MoS2 in air at temperatures as low as 285 °C for 2 h results in rapid degradation of the monolayer within 2.5 weeks of ambient air exposure after heating. We find that the rapid degradation proceeds via the growth of dendrites on the basal plane that have a fractal dimension close to that of diffusion-limited aggregation. We also observe dendrites in unheated samples that have been in ambient air for a year. We explain the rapid degradation after heating to an increase in MoO3. We propose that the mechanism for dendrite growth involves the diffusion of H2O to oxide sites. This results in the liquefication of the oxides. The liquefied oxides do not protect the surface from further oxidation. Putting heated samples in a dry box for 2 weeks immediately after heating prevents the rapid degradation from occurring.

show abstract

Chr20q Amplification Defines a Distinct Molecular Subtype of Microsatellite Stable Colorectal Cancer

Zhang

Yao

Zhou

et al. 2021

View full text Add to dashboard Cite

Colorectal cancer is the third leading cause of cancer-related death in the United States. About 15% of colorectal cancers are associated with microsatellite instability (MSI) due to loss of function in the DNA mismatch repair pathway. This subgroup of patients has better survival rates and is more sensitive to immunotherapy. However, it remains unclear whether microsatellite stable (MSS) patients with colorectal cancer can be further stratified into subgroups with differential clinical characteristics. In this study, we analyzed The Cancer Genome Atlas data and found that Chr20q amplification is the most frequent copy number alteration that occurs specifically in colon (46%) and rectum (61%) cancer and is mutually exclusive with MSI. Importantly, MSS patients with Chr20q amplification (MSS-A) were associated with better recurrence-free survival compared with MSS patients without Chr20q amplification (MSS-N; P = 0.03). MSS-A tumors were associated with high level of chromosome instability and low immune infiltrations. In addition, MSS-A and MSS-N tumors were associated with somatic mutations in different driver genes, with high frequencies of mutated TP53 in MSS-A and mutated KRAS and BRAF in MSS-N. Our results suggest that MSS-A and MSS-N represent two subtypes of MSS colorectal cancer, and such stratification may be used to improve therapeutic treatment in an individualized manner. Significance: This study shows that chromosome 20q amplification occurs predominately in microsatellite-stable colorectal cancer and defines a distinct subtype with good prognosis, high chromosomal instability, distinct mutation profiles, and low immune infiltrations.

show abstract

A Practical Classification of the Monilias

et al. 1937

View full text Add to dashboard Cite

From Drs. Stovall and Bubolz 91 Monilia type I 92 Monilia type II 93 Monilia type III From Drs. Langeron and Talice 46 Mycotorula p8ilosis Their no. 340 44 Mycotoruloides ovali8 Their no. 296 C-70 Geotrichoides Krusei Their no. 683 49 Candida tropicalis Their no. 255 C-76 Candida parapsilosis Their no. 341 171 Blastodendrion intermedium Their no. 493 47 Mycocandida mortifera Their no. 516 From Drs. Reed and Johnstone 238 Monilia type II. 239 Monilia type III 240 Monilia type IV 242 Monilia type VI-METHODS OF IDENTIFICATION

show abstract

Deep Learning Predicts EBV Status in Gastric Cancer Based on Spatial Patterns of Lymphocyte Infiltration

Zhang

Yao

et al. 2021

Cancers

View full text Add to dashboard Cite

EBV infection occurs in around 10% of gastric cancer cases and represents a distinct subtype, characterized by a unique mutation profile, hypermethylation, and overexpression of PD-L1. Moreover, EBV positive gastric cancer tends to have higher immune infiltration and a better prognosis. EBV infection status in gastric cancer is most commonly determined using PCR and in situ hybridization, but such a method requires good nucleic acid preservation. Detection of EBV status with histopathology images may complement PCR and in situ hybridization as a first step of EBV infection assessment. Here, we developed a deep learning-based algorithm to directly predict EBV infection in gastric cancer from H&E stained histopathology slides. Our model can not only predict EBV infection in gastric cancers from tumor regions but also from normal regions with potential changes induced by adjacent EBV+ regions within each H&E slide. Furthermore, in cohorts with zero EBV abundances, a significant difference of immune infiltration between high and low EBV score samples was observed, consistent with the immune infiltration difference observed between EBV positive and negative samples. Therefore, we hypothesized that our model’s prediction of EBV infection is partially driven by the spatial information of immune cell composition, which was supported by mostly positive local correlations between the EBV score and immune infiltration in both tumor and normal regions across all H&E slides. Finally, EBV scores calculated from our model were found to be significantly associated with prognosis. This framework can be readily applied to develop interpretable models for prediction of virus infection across cancers.

show abstract

A framework to predict the applicability of Oncotype DX, MammaPrint, and E2F4 gene signatures for improving breast cancer prognostic prediction

Yao

Tong

Cheng

2022

Sci Rep

View full text Add to dashboard Cite

To improve cancer precision medicine, prognostic and predictive biomarkers are critically needed to aid physicians in deciding treatment strategies in a personalized fashion. Due to the heterogeneous nature of cancer, most biomarkers are expected to be valid only in a subset of patients. Furthermore, there is no current approach to determine the applicability of biomarkers. In this study, we propose a framework to improve the clinical application of biomarkers. As part of this framework, we develop a clinical outcome prediction model (CPM) and a predictability prediction model (PPM) for each biomarker and use these models to calculate a prognostic score (P-score) and a confidence score (C-score) for each patient. Each biomarker’s P-score indicates its association with patient clinical outcomes, while each C-score reflects the biomarker applicability of the biomarker’s CPM to a patient and therefore the confidence of the clinical prediction. We assessed the effectiveness of this framework by applying it to three biomarkers, Oncotype DX, MammaPrint, and an E2F4 signature, which have been used for predicting patient response, pathologic complete response versus residual disease to neoadjuvant chemotherapy (a classification problem), and recurrence-free survival (a Cox regression problem) in breast cancer, respectively. In both applications, our analyses indicated patients with higher C scores were more likely to be correctly predicted by the biomarkers, indicating the effectiveness of our framework. This framework provides a useful approach to develop and apply biomarkers in the context of cancer precision medicine.

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kevin Yao

Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk

Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk

Evaluation of header metadata extraction approaches and tools for scientific PDF documents

Rapid ambient degradation of monolayer MoS₂ after heating in air

Chr20q Amplification Defines a Distinct Molecular Subtype of Microsatellite Stable Colorectal Cancer

A Practical Classification of the Monilias

Deep Learning Predicts EBV Status in Gastric Cancer Based on Spatial Patterns of Lymphocyte Infiltration

A framework to predict the applicability of Oncotype DX, MammaPrint, and E2F4 gene signatures for improving breast cancer prognostic prediction

Contact Info

Product

Resources

About

Kevin Yao

Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk

Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk

Evaluation of header metadata extraction approaches and tools for scientific PDF documents

Rapid ambient degradation of monolayer MoS2 after heating in air

Chr20q Amplification Defines a Distinct Molecular Subtype of Microsatellite Stable Colorectal Cancer

A Practical Classification of the Monilias

Deep Learning Predicts EBV Status in Gastric Cancer Based on Spatial Patterns of Lymphocyte Infiltration

A framework to predict the applicability of Oncotype DX, MammaPrint, and E2F4 gene signatures for improving breast cancer prognostic prediction

Contact Info

Product

Resources

About

Rapid ambient degradation of monolayer MoS₂ after heating in air