As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy.
Cancers require telomere maintenance mechanisms for unlimited replicative potential. We dissected whole-genome sequencing data of over 2,500 matched tumor-control samples from 36 different tumor types to characterize the genomic footprints of these mechanisms. While the telomere content of tumors with ATRX or DAXX mutations (ATRX/DAXX trunc ) was increased, tumors with TERT modifications showed a moderate decrease of telomere content. While normally located at the chromosome termini, telomere sequences are also found in intrachromosomal regions. As such, interstitial telomeric sequences with large blocks of telomere repeats exist in humans and other species, which probably arose from ancestral genome rearrangements or other evolutionary events 19 . Recently, ALT-specific, targeted telomere insertions into chromosomes that lead to genomic instability have also been described 20 . Another source for unexpected telomere repeat sites is the stabilizing function of telomeres at broken chromosomes. After a double-strand break, telomeres can be added de novo to the unprotected break sites ("telomere healing") 21,22 or acquired from other chromosomal positions ("telomere capture") 23,24 .Here, we characterized the telomere landscape of 2,519 tumor samples from 36 different tumor types using whole genome sequencing data from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project 25 . Besides determining telomere content and searching for mutations associated with different telomere maintenance mechanisms (TMMs), we systematically detected 2,683 somatic telomere insertions and show that different TMMs are associated with enrichment of previously undescribed singleton TVRs.not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.The copyright holder for this preprint (which was . http://dx.doi.org/10.1101/157560 doi: bioRxiv preprint first posted online Jun. 30, 2017; Results Telomere content across cohortsDue to the repetitive nature of telomere sequences, short sequencing reads from telomeres cannot be uniquely aligned to individual chromosomes. However, a mean telomere content for the tumor as a whole can be estimated from the number of reads containing telomere sequences 17,[26][27][28][29] . Here, we extracted reads containing at least six telomere repeats per 100 bases, allowing the canonical telomere repeat TTAGGG and the three most common TVRs TCAGGG, TGAGGG and TTGGGG. The telomere content was defined as the number of unaligned telomere reads normalized by sequencing coverage
Background Establishment of telomere maintenance mechanisms is a universal step in tumor development to achieve replicative immortality. These processes leave molecular footprints in cancer genomes in the form of altered telomere content and aberrations in telomere composition. To retrieve these telomere characteristics from high-throughput sequencing data the available computational approaches need to be extended and optimized to fully exploit the information provided by large scale cancer genome data sets. Results We here present TelomereHunter, a software for the detailed characterization of telomere maintenance mechanism footprints in the genome. The tool is implemented for the analysis of large cancer genome cohorts and provides a variety of diagnostic diagrams as well as machine-readable output for subsequent analysis. A novel key feature is the extraction of singleton telomere variant repeats, which improves the identification and subclassification of the alternative lengthening of telomeres phenotype. We find that whole genome sequencing-derived telomere content estimates strongly correlate with telomere qPCR measurements (r = 0.94). For the first time, we determine the correlation of in silico telomere content quantification from whole genome sequencing and whole genome bisulfite sequencing data derived from the same tumor sample (r = 0.78). An analogous comparison of whole exome sequencing data and whole genome sequencing data measured slightly lower correlation (r = 0.79). However, this is considerably improved by normalization with matched controls (r = 0.91). Conclusions TelomereHunter provides new functionality for the analysis of the footprints of telomere maintenance mechanisms in cancer genomes. Besides whole genome sequencing, whole exome sequencing and whole genome bisulfite sequencing are suited for in silico telomere content quantification, especially if matched control samples are available. The software runs under a GPL license and is available at https://www.dkfz.de/en/applied-bioinformatics/telomerehunter/telomerehunter.html . Electronic supplementary material The online version of this article (10.1186/s12859-019-2851-0) contains supplementary material, which is available to authorized users.
Cancers require telomere maintenance mechanisms for unlimited replicative potential. They achieve this through TERT activation or alternative telomere lengthening associated with ATRX or DAXX loss. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we dissect whole-genome sequencing data of over 2500 matched tumor-control samples from 36 different tumor types aggregated within the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium to characterize the genomic footprints of these mechanisms. While the telomere content of tumors with ATRX or DAXX mutations (ATRX/DAXX trunc) is increased, tumors with TERT modifications show a moderate decrease of telomere content. One quarter of all tumor samples contain somatic integrations of telomeric sequences into non-telomeric DNA. This fraction is increased to 80% prevalence in ATRX/DAXX trunc tumors, which carry an aberrant telomere variant repeat (TVR) distribution as another genomic marker. The latter feature includes enrichment or depletion of the previously undescribed singleton TVRs TTCGGG and TTTGGG, respectively. Our systematic analysis provides new insight into the recurrent genomic alterations associated with telomere maintenance mechanisms in cancer.
Heterogeneous accelerators often disappoint. They provide the prospect of great performance, but only deliver it when using vendor specific optimized libraries or domain specific languages. This requires considerable legacy code modifications, hindering the adoption of heterogeneous computing.This paper develops a novel approach to automatically detect opportunities for accelerator exploitation. We focus on calculations that are well supported by established APIs: sparse and dense linear algebra, stencil codes and generalized reductions and histograms. We call them idioms and use a custom constraint-based Idiom Description Language (IDL) to discover them within user code. Detected idioms are then mapped to BLAS libraries, cuSPARSE and clSPARSE and two DSLs: Halide and Lift.We implemented the approach in LLVM and evaluated it on the NAS and Parboil sequential C/C++ benchmarks, where we detect 60 idiom instances. In those cases where idioms are a significant part of the sequential execution time, we generate code that achieves 1.26× to over 20× speedup on integrated and external GPUs.CCS Concepts • Computer systems organization → Heterogeneous (hybrid) systems; • Software and its engineering → Domain specific languages; ACM Reference Format:
Fast numerical libraries have been a cornerstone of scientific computing for decades, but this comes at a price. Programs may be tied to vendor specific software ecosystems resulting in polluted, non-portable code. As we enter an era of heterogeneous computing, there is an explosion in the number of accelerator libraries required to harness specialized hardware. We need a system that allows developers to exploit ever-changing accelerator libraries, without over-specializing their code.As we cannot know the behavior of future libraries ahead of time, this paper develops a scheme that assists developers in matching their code to new libraries, without requiring the source code for these libraries.Furthermore, it can recover equivalent code from programs that use existing libraries and automatically port them to new interfaces. It first uses program synthesis to determine the meaning of a library, then maps the synthesized description into generalized constraints which are used to search the program for replacement opportunities to present to the developer.We applied this approach to existing large applications from the scientific computing and deep learning domains. Using our approach, we show speedups ranging from 1.1× to over 10× on end to end performance when using accelerator libraries.
Background: Genetic and environmental risk factors are assumed to contribute to the susceptibility to cervical artery dissection (CeAD). To explore the role of genetic imbalance in the etiology of CeAD, copy number variants (CNVs) were identified in high-density microarrays samples from the multicenter CADISP (Cervical Artery Dissection and Ischemic Stroke Patients) study and from control subjects from the CADISP study and the German PopGen biobank. Microarray data from 833 CeAD patients and 2040 control subjects (565 subjects with ischemic stroke due to causes different from CeAD and 1475 disease-free individuals) were analyzed. Rare genic CNVs were equally frequent in CeAD-patients (16.4%; n=137) and in control subjects (17.0%; n=346) but differed with respect to their genetic content. Compared to control subjects, CNVs from CeAD patients were enriched for genes associated with muscle organ development and cell differentiation, which suggests a possible association with arterial development. CNVs affecting cardiovascular system development were more common in CeAD patients than in control subjects (p=0.003; odds ratio (OR) =2.5; 95% confidence interval (95% CI) =1.4-4.5) and more common in patients with a familial history of CeAD than in those with sporadic CeAD (p=0.036; OR=11.2; 95% CI=1.2-107).Conclusion: The findings suggest that rare genetic imbalance affecting cardiovascular system development may contribute to the risk of CeAD. Validation of these findings in independent study populations is warranted.
Background and Purpose— We sought to explore the effect of genetic imbalance on functional outcome after ischemic stroke (IS). Methods— Copy number variation was identified in high-density single-nucleotide polymorphism microarray data of IS patients from the CADISP (Cervical Artery Dissection and Ischemic Stroke Patients) and SiGN (Stroke Genetics Network)/GISCOME (Genetics of Ischaemic Stroke Functional Outcome) networks. Genetic imbalance, defined as total number of protein-coding genes affected by copy number variations in an individual, was compared between patients with favorable (modified Rankin Scale score of 0–2) and unfavorable (modified Rankin Scale score of ≥3) outcome after 3 months. Subgroup analyses were confined to patients with imbalance affecting ohnologs—a class of dose-sensitive genes, or to those with imbalance not affecting ohnologs. The association of imbalance with outcome was analyzed by logistic regression analysis, adjusted for age, sex, stroke subtype, stroke severity, and ancestry. Results— The study sample comprised 816 CADISP patients (age 44.2±10.3 years) and 2498 SiGN/GISCOME patients (age 67.7±14.2 years). Outcome was unfavorable in 122 CADISP and 889 SiGN/GISCOME patients. Multivariate logistic regression analysis revealed that increased genetic imbalance was associated with less favorable outcome in both samples (CADISP: P =0.0007; odds ratio=0.89; 95% CI, 0.82–0.95 and SiGN/GISCOME: P =0.0036; odds ratio=0.94; 95% CI, 0.91–0.98). The association was independent of age, sex, stroke severity on admission, stroke subtype, and ancestry. On subgroup analysis, imbalance affecting ohnologs was associated with outcome (CADISP: odds ratio=0.88; 95% CI, 0.80–0.95 and SiGN/GISCOME: odds ratio=0.93; 95% CI, 0.89–0.98) whereas imbalance without ohnologs lacked such an association. Conclusions— Increased genetic imbalance was associated with poorer functional outcome after IS in both study populations. Subgroup analysis revealed that this association was driven by presence of ohnologs in the respective copy number variations, suggesting a causal role of the deleterious effects of genetic imbalance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.