Detection of DNA copy number aberrations by shallow whole-genome sequencing (WGS) faces many challenges, including lack of completion and errors in the human reference genome, repetitive sequences, polymorphisms, variable sample quality, and biases in the sequencing procedures. Formalin-fixed paraffin-embedded (FFPE) archival material, the analysis of which is important for studies of cancer, presents particular analytical difficulties due to degradation of the DNA and frequent lack of matched reference samples. We present a robust, cost-effective WGS method for DNA copy number analysis that addresses these challenges more successfully than currently available procedures. In practice, very useful profiles can be obtained with~0.13 genome coverage. We improve on previous methods by first implementing a combined correction for sequence mappability and GC content, and second, by applying this procedure to sequence data from the 1000 Genomes Project in order to develop a blacklist of problematic genome regions. A small subset of these blacklisted regions was previously identified by ENCODE, but the vast majority are novel unappreciated problematic regions. Our procedures are implemented in a pipeline called QDNAseq. We have analyzed over 1000 samples, most of which were obtained from the fixed tissue archives of more than 25 institutions. We demonstrate that for most samples our sequencing and analysis procedures yield genome profiles with noise levels near the statistical limit imposed by read counting. The described procedures also provide better correction of artifacts introduced by low DNA quality than prior approaches and better copy number data than high-resolution microarrays at a substantially lower cost.
BackgroundThe growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software.ResultsChipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies.ConclusionsChipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available.
We performed an integrated array comparative genomic hybridization (aCGH) and expression microarray analysis of 8 normal gastric tissues and 38 primary tumors, including 25 intestinal and 13 diffuse gastric adenocarcinomas to identify genes whose expression is deregulated in association with copy number alteration. Our aim was also to identify molecular genetic alterations that are specific to particular clinicopathological characteristics of gastric cancer. Distinct molecular genetic profiles were identified for intestinal and diffuse gastric cancers and for tumors obtained from 2 different locations of the stomach. Interestingly, the ERBB2 amplification and gains at 20q13.12-q13.33 almost exclusively discriminated intestinal cancers from the diffuse type. In addition, the 17q12-q25 gain was characteristic to cancers located in corpus and the 20q13.12-q13.13 gain was more common in the antrum. Statistical analysis was performed using integrated copy number and expression data to identify genes showing differential expression associated with a copy number alteration. Genes with the highest statistical significance included ERBB2, MUC1, GRB7, PPP1R1B and PPARBP with concomitant changes in copy number and expression. Immunohistochemical analysis of ERBB2 and MUC1 on a tissue microarray containing 78 independent gastric tissues showed statistically significant differences (p < 0.05 and <0.001) in immunopositivity in the intestinal (31 and 70%) and diffuse subtypes (14 and 41%), respectively. In conclusion, our results demonstrate that intestinal and diffuse type gastric cancers as well as cancers located in different sites of the stomach have distinct molecular profiles which may have clinical value.
BAP1 mutations occurred in asbestos-exposed MM. MRPL1, SDK1, SEMA5B, and INPP4A could possibly serve as candidate genes for alterations associated with asbestos exposure. KRAS mutations in LAC were not associated with asbestos exposure.
We conclude that GNTs with diverse morphologies share molecular features, and our findings support the need to improve classification and differential diagnosis of tumour entities within the spectrum of GNTs, as well as their distinction from other gliomas.
Genetic alterations of the short arm of chromosome 9 are frequent in acute lymphoblastic leukemia. We performed targeted sequencing of 9p region in 35 adolescent and adult acute lymphoblastic leukemia patients and sought to investigate the sensitivity of detecting copy number alterations in comparison with array comparative genomic hybridization (aCGH), and besides, to detect novel genetic anomalies. We found a high concordance of copy number variations (CNVs) as detected by next generation sequencing (NGS) and aCGH. By both methodologies, the recurrent deletion at CDKN2A/B locus was identified, whereas NGS revealed additional, small regions of CNVs, seen more frequently in adult patients, while aCGH was better at detecting larger CNVs. Also, by NGS, we detected novel structural variations, novel SNVs and small insertion/deletion variants. Our results show that NGS, in addition to detecting mutations and other genetic aberrations, can be used to study CNVs.
Background: The disease course of patients with diffuse low-grade glioma is notoriously unpredictable. Temporal and spatially distinct samples may provide insight into the evolution of clinically relevant copy number aberrations (CNAs). The purpose of this study is to identify CNAs that are indicative of aggressive tumor behavior and can thereby complement the prognostically favorable 1p/19q co-deletion.
The use of genome-wide and high-throughput screening methods on large sample sizes is a well-grounded approach when studying a process as complex and heterogeneous as tumorigenesis. Gene copy number changes are one of the main mechanisms causing cancerous alterations in gene expression and can be detected using array comparative genomic hybridization (aCGH). Microarrays are well suited for the integrative systems biology approach, but none of the existing microarray databases is focusing on copy number changes. We present here CanGEM (Cancer GEnome Mine), which is a public, web-based database for storing quantitative microarray data and relevant metadata about the measurements and samples. CanGEM supports the MIAME standard and in addition, stores clinical information using standardized controlled vocabularies whenever possible. Microarray probes are re-annotated with their physical coordinates in the human genome and aCGH data is analyzed to yield gene-specific copy numbers. Users can build custom datasets by querying for specific clinical sample characteristics or copy number changes of individual genes. Aberration frequencies can be calculated for these datasets, and the data can be visualized on the human genome map with gene annotations. Furthermore, the original data files are available for more detailed analysis. The CanGEM database can be accessed at http://www.cangem.org/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.