Accurate identification of DNA regulatory elements becomes an urgent need in the post-genomic era. Recent genome-wide chromatin states mapping efforts revealed that DNA elements are associated with characteristic chromatin modification signatures, based on which several approaches have been developed to predict transcriptional enhancers. However, their practical application is limited by incomplete extraction of chromatin features and model inconsistency for predicting enhancers across different cell types. To address these issues, we define a set of non-redundant shape features of histone modifications, which shows high consistency across cell types and can greatly reduce the dimensionality of feature vectors. Integrating shape features with a machine-learning algorithm AdaBoost, we developed an enhancer predicting method, DELTA (Distal Enhancer Locating Tool based on AdaBoost). We show that DELTA significantly outperforms current enhancer prediction methods in prediction accuracy on different datasets and can predict enhancers in one cell type using models trained in other cell types without loss of accuracy. Overall, our study presents a novel framework for accurately identifying enhancers from epigenetic data across multiple cell types.
The identification and interpretation of germline BRCA1/2 variants become increasingly important in breast and ovarian cancer (OC) treatment. However, there is no comprehensive analysis of the germline BRCA1/2 variants in a Chinese population. Here we performed a systematic review and meta-analysis on such variants from 94 publications. A total of 2,128 BRCA1/2 variant records were extracted, including 601 from BRCA1 and 632 from BRCA2. In addition, 414, 734, 449, and 307 variants were also recorded in the BIC, ClinVar, ENIGMA, and UMD databases, respectively, and 579 variants were newly reported. Subsequent analysis showed that the overall germline BRCA1/2 pathogenic variant frequency was 5.7% and 21.8% in Chinese breast and OC, respectively. Populations with high-risk factors exhibited a higher pathogenic variant percentage. Furthermore, the variant profile in Chinese is distinct from that in other ethnic groups with no distinct founder pathogenic variants. We also tested our in-house American College of Medical Genetics-guided pathogenicity interpretation procedure for Chinese BRCA1/2 variants. Our results achieved a consistency of 91.2-97.6% (5-grade classification) or 98.4-100% (2-grade classification) with public databases. In conclusion, this study represents the first comprehensive meta-analysis of Chinese BRCA1/2 variants and validates our in-house pathogenicity interpretation procedure, thereby providing guidance for further PARP inhibitor development and companion diagnostics in the Chinese population.
CCCTC-binding factor (CTCF) is a multi-functional protein that is assigned various, even contradictory roles in the genome. High-throughput sequencing-based technologies such as ChIP-seq and Hi-C provided us the opportunity to assess the multivalent functions of CTCF in the human genome. The location of CTCF-binding sites with respect to genomic features provides insights into the possible roles of this protein. Here we present the first genome-wide survey and characterization of three important functions of CTCF: enhancer insulator, chromatin barrier and enhancer linker. We developed a novel computational framework to discover the multivalent functions of CTCF based on chromatin state and three-dimensional chromatin architecture. We applied our method to five human cell lines and identified ∼46 000 non-redundant CTCF sites related to the three functions. Disparate effects of these functions on gene expression were found and distinct genomic features of these CTCF sites were characterized in GM12878 cells. Finally, we investigated the cell-type specificities of CTCF sites related to these functions across five cell types. Our study provides new insights into the multivalent functions of CTCF in the human genome.
Background With the recent emergence of immune checkpoint inhibitors, microsatellite instability (MSI) status has become an important biomarker for immune checkpoint blockade therapy. There are growing technical demands for the integration of different genomic alterations profiling including MSI analysis in a single assay for full use of the limited tissues. Methods Tumor and paired control samples from 64 patients with primary colorectal cancer were enrolled in this study, including 14 MSI-high (MSI-H) cases and 50 microsatellite stable (MSS) cases determined by MSI-PCR. All the samples were sequenced by a customized NGS panel covering 2.2 MB. A training dataset of 28 samples was used for selection of microsatellite loci and a novel NGS-based MSI status classifier, USCI-msi, was developed. NGS-based MSI status, single nucleotide variant (SNV) and tumor mutation burden (TMB) were detected for all patients. Most of the patients were also independently detected by immunohistochemistry (IHC) staining. Results A 9-loci model for detecting microsatellite instability was able to correctly predict MSI status with 100% sensitivity and specificity compared with MSI-PCR, and 84.3% overall concordance with IHC staining. Mutations in cancer driver genes (APC, TP53, and KRAS) were dispersed in MSI-H and MSS cases, while BRAF p.V600E and frameshifts in TCF7L2 gene occurred only in MSI-H cases. Mismatch repair (MMR)-related genes are highly mutated in MSI-H samples. Conclusion We established a new NGS-based MSI classifier, USCI-msi, with as few as 9 microsatellite loci for detecting MSI status in CRC cases. This approach possesses 100% sensitivity and specificity, and performed robustly in samples with low tumor purity.
Purpose. Circulating tumor DNA (ctDNA) served as a noninvasive method with less side effects using peripheral blood. Given the studies on concordance rate between liquid and solid biopsies in Chinese breast cancer (BC) patients were limited, we sought to examine the concordance rate of different kinds of genomic alterations between paired tissue biopsies and ctDNA samples in Chinese BC cohorts. Materials and Methods. In this study, we analyzed the genomic alteration profiles of 81 solid BC samples and 41 liquid BC samples. The concordance across 136 genes was evaluated. Results. The median mutation counts per sample in 41 ctDNA samples was higher than the median in 81 tissue samples (p=0.0254; Wilcoxon rank sum test). For mutation at the protein-coding level, 39.0% (16/41) samples had at least one concordant mutation in two biopsies. 20.0% tissue-derived mutations could be detected via ctDNA-based sequencing, whereas 11.7% ctDNA-derived mutations could be found in paired tissues. At gene amplification level, the overall concordant rate was 68.3% (28/41). The concordant rate at gene level for each patient ranged from 83.8% (114/136) to 99.3% (135/136). And, the mean level of variant allele frequency (VAF) for concordant mutations in ctDNA was statistically higher than that for the discordant ones (p<0.001; Wilcoxon rank sum test). Across five representative genes, the overall sensitivity and specificity were 49.0% and 85.9%, respectively. Conclusion. Our results indicated that ctDNA could provide complementary information on genetic characterizations in detecting single nucleotide variants (SNVs) and insertions and deletions (InDels).
Although several computational tools using next-generation sequencing (NGS) data have been proposed to detect microsatellite instability (MSI) status, they still have limitations and need improvement. We developed a NovoPM-MSI method to detect MSI status based on NGS data. This method evaluated target mononucleotide microsatellite loci that were sequenced during targeted gene enrichment analysis and reported sample instability score as the fraction of unstable loci within the target set after assessing locus instability by comparing length distribution in paired tumor-normal samples. We validated this method against the conventional MSI-PCR method in 113 paired colorectal cancer (CRC) specimens and compared the performance of NovoPM-MSI to that of mSINGS and MANTIS in accuracy and runtime efficiency. By using the MSI status from MSI-PCR as the gold standard, the three computational methods showed the same sensitivity of 88.9% but different specificities (NovoPM-MSI 97.1%, MANTIS 86.5% and mSINGS 99.0%). Only NovoPM-MSI could greatly improve both the sensitivity and specificity by setting an ambiguous interval. MANTIS had the shortest average runtime (16.3 sec), followed by NovoPM-MSI (18.3 sec) and mSINGS (109.0 sec). In short, the NovoPM-MSI method provides a fast and reliable MSI detection method with accuracy comparable to MSI-PCR in paired CRC samples.
Supplemental Digital Content is available in the text
Front Cover: The cover image is based on the Research Article Comprehensive profiling of BRCA1 and BRCA2 variants in breast and ovarian cancer in Chinese patients by Xianqi Gao et al., https://doi.org/10.1002/humu.23965. Cover image © Yulong Geng Images
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.