HoJoon Lee scite author profile

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.

show abstract

The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical–genomic driver associations

Lee

Palm

Grimes

et al. 2015

Genome Med

View full text Add to dashboard Cite

BackgroundThe Cancer Genome Atlas (TCGA) project has generated genomic data sets covering over 20 malignancies. These data provide valuable insights into the underlying genetic and genomic basis of cancer. However, exploring the relationship among TCGA genomic results and clinical phenotype remains a challenge, particularly for individuals lacking formal bioinformatics training. Overcoming this hurdle is an important step toward the wider clinical translation of cancer genomic/proteomic data and implementation of precision cancer medicine. Several websites such as the cBio portal or University of California Santa Cruz genome browser make TCGA data accessible but lack interactive features for querying clinically relevant phenotypic associations with cancer drivers. To enable exploration of the clinical–genomic driver associations from TCGA data, we developed the Cancer Genome Atlas Clinical Explorer.DescriptionThe Cancer Genome Atlas Clinical Explorer interface provides a straightforward platform to query TCGA data using one of the following methods: (1) searching for clinically relevant genes, micro RNAs, and proteins by name, cancer types, or clinical parameters; (2) searching for genomic/proteomic profile changes by clinical parameters in a cancer type; or (3) testing two-hit hypotheses. SQL queries run in the background and results are displayed on our portal in an easy-to-navigate interface according to user’s input. To derive these associations, we relied on elastic-net estimates of optimal multiple linear regularized regression and clinical parameters in the space of multiple genomic/proteomic features provided by TCGA data. Moreover, we identified and ranked gene/micro RNA/protein predictors of each clinical parameter for each cancer. The robustness of the results was estimated by bootstrapping. Overall, we identify associations of potential clinical relevance among genes/micro RNAs/proteins using our statistical analysis from 25 cancer types and 18 clinical parameters that include clinical stage or smoking history.ConclusionThe Cancer Genome Atlas Clinical Explorer enables the cancer research community and others to explore clinically relevant associations inferred from TCGA data. With its accessible web and mobile interface, users can examine queries and test hypothesis regarding genomic/proteomic alterations across a broad spectrum of malignancies.Electronic supplementary materialThe online version of this article (doi:10.1186/s13073-015-0226-3) contains supplementary material, which is available to authorized users.

show abstract

Single-cell analysis can define distinct evolution of tumor sites in follicular lymphoma

Haebe¹,

Shree²,

Sathe³

et al. 2021

View full text Add to dashboard Cite

Tumor heterogeneity complicates biomarker development and fosters drug resistance in solid malignancies. In lymphoma, our knowledge of site-to-site heterogeneity and its clinical implications is still limited. Here, we profiled two nodal, synchronously-acquired tumor samples from ten follicular lymphoma patients using single cell RNA, B cell receptor (BCR) and T cell receptor sequencing, and flow cytometry. By following the rapidly mutating tumor immunoglobulin genes, we discovered that BCR subclones were shared between the two tumor sites in some patients, but in many patients the disease had evolved separately with limited tumor cell migration between the sites. Patients exhibiting divergent BCR evolution also exhibited divergent tumor gene expression and cell surface protein profiles. While the overall composition of the tumor microenvironment did not differ significantly between sites, we did detect a specific correlation between site-to-site tumor heterogeneity and T follicular helper (Tfh) cell abundance. We further observed enrichment of particular ligand-receptor pairs between tumor and Tfh cells, including CD40 and CD40LG, and a significant correlation between tumor CD40 expression and Tfh proliferation. Our study may explain discordant responses to systemic therapies, underscores the difficulty of capturing a patient's disease with a single biopsy, and furthers our understanding of tumor-immune networks in follicular lymphoma.

show abstract

CRISPR–Cas9-targeted fragmentation and selective sequencing enable massively parallel microsatellite analysis

et al. 2017

View full text Add to dashboard Cite

Microsatellites are multi-allelic and composed of short tandem repeats (STRs) with individual motifs composed of mononucleotides, dinucleotides or higher including hexamers. Next-generation sequencing approaches and other STR assays rely on a limited number of PCR amplicons, typically in the tens. Here, we demonstrate STR-Seq, a next-generation sequencing technology that analyses over 2,000 STRs in parallel, and provides the accurate genotyping of microsatellites. STR-Seq employs in vitro CRISPR–Cas9-targeted fragmentation to produce specific DNA molecules covering the complete microsatellite sequence. Amplification-free library preparation provides single molecule sequences without unique molecular barcodes. STR-selective primers enable massively parallel, targeted sequencing of large STR sets. Overall, STR-Seq has higher throughput, improved accuracy and provides a greater number of informative haplotypes compared with other microsatellite analysis approaches. With these new features, STR-Seq can identify a 0.1% minor genome fraction in a DNA mixture composed of different, unrelated samples.

show abstract

RNA Transcription and Splicing Errors as a Source of Cancer Frameshift Neoantigens for Vaccines

Shen

Zhang

Lee

et al. 2019

Sci Rep

View full text Add to dashboard Cite

the success of checkpoint inhibitors in cancer therapy is largely attributed to activating the patient's immune response to their tumor's neoantigens arising from DnA mutations. this realization has motivated the interest in personal cancer vaccines based on sequencing the patient's tumor DnA to discover neoantigens. Here we propose an additional, unrecognized source of tumor neoantigens. We show that errors in transcription of microsatellites (MS) and mis-splicing of exons create highly immunogenic frameshift (fS) neoantigens in tumors. the sequence of these fS neoantigens are predictable, allowing creation of a peptide array representing all possible neoantigen fS peptides. this array can be used to detect the antibody response in a patient to the FS peptides. A survey of 5 types of cancers reveals peptides that are personally reactive for each patient. this source of neoantigens and the method to discover them may be useful in developing cancer vaccines. Checkpoint inhibitor immunotherapeutics are revolutionizing cancer therapy. However, even in the most responsive cancers a substantial portion (50-80%) of the patients have poor to no positive response 1-5. A surprising finding in the analysis of these patients was that one of the best correlates of response has been the total number of neoantigens in the tumor 6-8. This is also the case for patients with high microsatellite instability (MSI) where the production of FS neoantigens drives the effective anti-tumor immune responses 9-11. The realization of the immunological importance of these DNA mutations has spawned the effort to develop personal vaccines 12. As promising as early studies are of these vaccines, a major problem is that the majority of tumors will not have enough neoantigen-generating mutations to sustain development of a personal vaccine 13-15. For example, melanoma tumors have a high mutational level with an average of 200 neoepitope mutations. This provides a large number to algorithmically screen for optimal antigenic presentation. In recent reports of two Phase I clinical trials of personal melanoma vaccines, starting with 90~2,000 personal neoantigens, 10 or 20 were identified for the vaccine 16,17. However, in glioblastoma multiforme (GBM) only 3.5% patients had a high tumor mutation load, and further analysis showed that only a very small subset of GBM patients would potentially benefit from checkpoint blockade treatment 18. This is also consistent with a lack of response in GBM patients to checkpoint inhibitors 19. Massive genomic sequencing results indicated that GBM, ovarian cancer, breast adenocarcinoma and many other cancer types had very low number non-synonymous mutations, which will make these cancers difficult targets for personalized cancer vaccines 14,20. To solve this problem, we have investigated an alternative source of neoantigens which could possibly expand the scope of the application and efficacy of the neoantigen based cancer vaccines. In the process of becoming a tumor, not only does the DNA mutation rate increase wit...

show abstract

Systematic genomic identification of colorectal cancer genes delineating advanced from early clinical stage and metastasis

Lee

Flaherty

2013

BMC Med Genomics

View full text Add to dashboard Cite

BackgroundColorectal cancer is the third leading cause of cancer deaths in the United States. The initial assessment of colorectal cancer involves clinical staging that takes into account the extent of primary tumor invasion, determining the number of lymph nodes with metastatic cancer and the identification of metastatic sites in other organs. Advanced clinical stage indicates metastatic cancer, either in regional lymph nodes or in distant organs. While the genomic and genetic basis of colorectal cancer has been elucidated to some degree, less is known about the identity of specific cancer genes that are associated with advanced clinical stage and metastasis.MethodsWe compiled multiple genomic data types (mutations, copy number alterations, gene expression and methylation status) as well as clinical meta-data from The Cancer Genome Atlas (TCGA). We used an elastic-net regularized regression method on the combined genomic data to identify genetic aberrations and their associated cancer genes that are indicators of clinical stage. We ranked candidate genes by their regression coefficient and level of support from multiple assay modalities.ResultsA fit of the elastic-net regularized regression to 197 samples and integrated analysis of four genomic platforms identified the set of top gene predictors of advanced clinical stage, including: WRN, SYK, DDX5 and ADRA2C. These genetic features were identified robustly in bootstrap resampling analysis.ConclusionsWe conducted an analysis integrating multiple genomic features including mutations, copy number alterations, gene expression and methylation. This integrated approach in which one considers all of these genomic features performs better than any individual genomic assay. We identified multiple genes that robustly delineate advanced clinical stage, suggesting their possible role in colorectal cancer metastatic progression.

show abstract

Colorectal Cancer Metastases in the Liver Establish Immunosuppressive Spatial Networking between Tumor-Associated SPP1+ Macrophages and Fibroblasts

Sathe

Mason

Grimes

et al. 2022

View full text Add to dashboard Cite

Purpose:The liver is the most frequent metastatic site for colorectal cancer (CRC). Its microenvironment is modified to provide a niche that is conducive for CRC cell growth.This study focused on characterizing the cellular changes in the metastatic CRC (mCRC) liver tumor microenvironment (TME). Experimental Design: We analyzed a series of microsatellite stable (MSS) mCRCs to the liver, paired normal liver tissue and peripheral blood mononuclear cells using single cell RNA-seq (scRNA-seq). We validated our findings using multiplexed spatial imaging and bulk gene expression with cell deconvolution. Results: We identified TME-specific SPP1-expressing macrophages with altered metabolism features, foam cell characteristics and increased activity in extracellular matrix (ECM) organization. SPP1+ macrophages and fibroblasts expressed complementary ligand receptor pairs with the potential to mutually influence their gene expression programs. TME lacked dysfunctional CD8 T cells and contained regulatory T cells, indicative of immunosuppression. Spatial imaging validated these cell states in the TME. Moreover, TME macrophages and fibroblasts had close spatial proximity, which is a requirement for intercellular communication and networking.In an independent cohort of mCRCs in the liver, we confirmed the presence of SPP1+ macrophages and fibroblasts using gene expression data. An increased proportion of TME fibroblasts was associated with a worst prognosis in these patients. Conclusions: We demonstrated that mCRC in the liver is characterized by transcriptional alterations of macrophages in the TME. Intercellular networking between macrophages and fibroblasts supports CRC growth in the immunosuppressed metastatic niche in the liver. These features can be used to target immune checkpoint resistant MSS tumors.

show abstract

Pangenome graph construction from genome alignments with Minigraph-Cactus

et al. 2023

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

HoJoon Lee

A draft human pangenome reference

The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical–genomic driver associations

Single-cell analysis can define distinct evolution of tumor sites in follicular lymphoma

CRISPR–Cas9-targeted fragmentation and selective sequencing enable massively parallel microsatellite analysis

RNA Transcription and Splicing Errors as a Source of Cancer Frameshift Neoantigens for Vaccines

Systematic genomic identification of colorectal cancer genes delineating advanced from early clinical stage and metastasis

Colorectal Cancer Metastases in the Liver Establish Immunosuppressive Spatial Networking between Tumor-Associated SPP1+ Macrophages and Fibroblasts

Pangenome graph construction from genome alignments with Minigraph-Cactus

Contact Info

Product

Resources

About