We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9%, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding ;183 haploid coverage of aligned sequence and close to 3003 clone coverage. Over 98% of the reference genome is covered with at least one uniquely placed read, and 99.65% is spanned by at least one uniquely placed matepaired clone. We identify over 3.8 million SNPs, 19% of which are novel. Mate-paired data are used to physically resolve haplotype phases of nearly two-thirds of the genotypes obtained and produce phased segments of up to 215 kb. We detect 226,529 intra-read indels, 5590 indels between mate-paired reads, 91 inversions, and four gene fusions. We use a novel approach for detecting indels between mate-paired reads that are smaller than the standard deviation of the insert size of the library and discover deletions in common with those detected with our intra-read approach. Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual. There is more genetic variation in the human genome still to be uncovered, and we provide guidance for future surveys in populations and cancer biopsies.[Supplemental material is available online at
Recent studies with tiling arrays have revealed more genomic transcription than previously anticipated. Whole new groups of non-coding transcripts (NCTs) have been detected. Some of these NCTs, including miRNAs, can regulate gene expression. To date, most known NCTs studied have been relatively short, but several important regulatory NCTs, including XIST, MALAT-1, BC1 and BC200, are considerably larger in length and represent a novel class of long, non-coding RNA species. Whole-genome tiling arrays were utilized to identify novel long NCTs across the entire human genome. Our results have identified a new group of long (>400 nt), abundantly expressed NCTs and have found that a subset of these are also highly evolutionarily conserved. In this report, we have begun to characterize 15 long, conserved NCTs. Quantitative real-time RT-PCR was used to analyze their expression in different normal human tissue and also in breast and ovarian cancers. We found altered expression of many of these NCTs in both cancer types. In addition, several of these NCTs have consistent mutations when sequences of normal samples were compared with a panel of cancer-derived cell lines. One NCT was found to be consistently mutated in a panel of endometrial cancers compared with matched normal blood. These NCTs were among the most abundantly expressed transcripts detected. There are probably many long, conserved NCTs, albeit with lower levels of expression. Although the function of these NCTs is currently unknown, our study indicates that they may play an important function in both normal cells and in cancer development.
The common fragile sites are regions of profound genomic instability found in all individuals. The full size of each region of instability ranges from under one megabase (Mb) to greater than 10 Mbs. At least half of the CFS regions have been found to span extremely large genes that spanned from 600 kb to greater than 2.0 Mbs. The large CFS genes are also very interesting from a cancer perspective as several of them, including FHIT and WWOX, have already demonstrated the capacity to function as tumor suppressor genes, both in vitro and in vivo. We estimate that there may be 40–50 large genes localized in CFS regions. The expression of a number of the large CFS genes has been previously shown to be lost in many different cancers and this is frequently associated with a worse clinical outcome for patients. To determine if there was selection for the inactivation of different large CFS genes in different cancers, we examined the expression of 13 of the 20 known large CFS genes: FHIT, WWOX, PARK2, GRID2, NBEA, DLG2, RORA isoforms 1 and 4, DAB1, CNTNAP2, DMD, IL1RAPL1, IMMP2L and LARGE in breast, ovarian, endometrial and brain cancers using real-time RT-PCR analysis. Each cancer had a distinct profile of different large CFS genes that were inactivated. Interestingly, in breast, ovarian and endometrial cancers there were some cancers that had inactivation of expression of none or only one of the tested genes, while in other specimens there was inactivation of multiple tested genes. Brain cancers had inactivation of many of the tested genes, a number of which function in normal neurological development. We find that there is no relationship between the frequency that any specific CFS is expressed and the frequency that the gene from that region is inactivated in different cancers. Instead, it appears that different cancers select for the inactivation of different large CFS genes.
It has recently become clear that the transcriptional output of the human genome is far more abundant than previously anticipated, with the vast majority of transcripts not coding for protein. Utilizing whole-genome tiling arrays, we analyzed the transcription across the entire genome in both normal human bronchial epithelial cells (NHBE) and NHBE cells exposed to the tobacco carcinogen NNK. Our efforts focused on the characterization of non-coding transcripts that were greater than 300 nucleotides in length and whose expression was increased in response to NNK. We identified 12 Long Stress-Induced Non-coding Transcripts that we term LSINCTs. Northern blot analysis revealed that these transcripts were larger than predicted from the tiling array data. Quantitative real-time RT-PCR performed across a panel of normal cell lines indicates that these transcripts are more abundantly expressed in rapidly growing tissues or in tissues that are more prone to cellular stress. These transcripts that have increased expression after exposure to NNK also had increased expression in a number of lung cancer cell lines and also in many breast cancer cell lines. Collectively, our results identified a new class of long stress responsive non-coding transcripts, LSINCTs, which have increased expression in response to DNA damage induced by NNK. LSINCTs interestingly also have increased expression in a number of cancer-derived cell lines, indicating that the expression is increased in both, correlating cellular stress and cancer.
Common fragile sites (CFSs) are large regions of profound genomic instability found in all individuals. Spanning the center of the two most frequently expressed CFS regions, FRA3B (3p14.3) and FRA16D (16q23.2), are the 1.5 Mb FHIT gene and the 1.0 Mb WWOX gene. These genes are frequently deleted and/or altered in many different cancers. Both FHIT and WWOX have been demonstrated to function as tumor suppressors, both in vitro and in vivo. A number of other large CFS genes have been identified and are also frequently inactivated in multiple cancers. Based on these data, several additional very large genes were tested to determine if they were derived from within CFS regions, but DCC and RAD51L1 were not. However, the 2.0 Mb DMD gene and its immediately distal neighbor, the 1.8 Mb IL1RAPL1 gene are CFS genes contained within the FRAXC CFS region (Xp21.2→p21.1). They are abundantly expressed in normal brain but were dramatically underexpressed in every brain tumor cell line and xenograft (derived from an intracranial model of glioblastoma multiforme) examined. We studied the expression of eleven other large CFS genes in the same panel of brain tumor cell lines and xenografts and found reduced expression of multiple large CFS genes in these samples. In this report we show that there is selective loss of specific large CFS genes in different cancers that does not appear to be mediated by the relative instability within different CFS regions. Further, the inactivation of multiple large CFS genes in xenografts and brain tumor cell lines may help to explain why this type of cancer is highly aggressive and associated with a poor clinical outcome.
Common fragile sites (CFS) are large, genomically unstable regions, which are hot-spots for deletions and other alterations, especially in cancer cells. Several have been shown to contain genes that span large genomic regions, such as FHIT (1.5 Mb), WWOX (1.0 Mb), GRID2 (1.36 Mb), PARK2 (1.3 Mb), and RORA (730 kb). These genes are frequently inactivated in multiple different cancers, and FHIT and WWOX are shown to function as tumor suppressors. The disabled-1 gene (DAB1) is one of the human homologs of the Drosophila disabled locus, which in mammals is involved in neuronal migration and lamination in the developing cerebral cortex. Mice DAB1 inactivation results in the neurological mutant Scrambler, having similarities to mice with the inactivation of PARK2 (Quaker), GRID2 (Lurcher), and RORA (Staggerer). We were interested in whether DAB1 was another large CFS gene that could have cancer development importance. We demonstrated here that the human DAB1 gene (spanning 1.25 Mb) mapped within FRA1B CFS region on chromosomal band 1p32.2. Real-time RT-PCR analysis revealed that the expression level of DAB1 was decreased in many human cancer samples, including primary tumor tissues and cancer-derived cell lines, from several different cancers, especially in brain and endometrial cancer. Additionally, the introduction of an over-expression DAB1 plasmid into two different cell lines, having insignificant endogenous DAB1 expression, resulted in decreased cell growth. In summary, DAB1 is another gene that resides within an unstable CFS region and might play a role in human tumorigenesis. These data may provide further linkage between neurological development and cancer.
Both arsenic and benzo[a]pyrene (BaP) inhibit terminal differentiation and alter growth potential in normal human epidermal keratinocytes (NHEK) in vitro. To identify molecular alterations that may be involved in these cellular processes, microarray analysis was carried out on NHEK treated with BaP or arsenic. The gene expression microarray results measuring mRNA levels were as follows: (1) in total, the expression of 85 genes was induced and 17 genes was suppressed by 2.0 microm BaP. (2) Arsenic at an equitoxic dose (5.0 microm) induced the expression of 106 and suppressed 15 genes. Quantitative real-time RT-PCR was used subsequently to confirm microarray findings on selected genes involved in keratinocyte growth and differentiation pathways. These studies confirmed increased mRNA levels in NHEK by BaP of alpha-integrin binding protein 63 (AIBP63) (2.48-fold), retinoic acid- and interferon-inducible protein (IFIT5) (2.74-fold), interleukin-1 alpha (IL1A) (2.64-fold), interleukin-1 beta (IL1B) (2.84-fold) and Ras guanyl releasing protein 1 (RASGRP1) (3.14-fold). Real-time RT-PCR confirmed that arsenic increased mRNA levels of the following genes: retinoblastoma 1 (RB1) (5.4-fold), retinoblastoma-binding protein 1 (ARID4A) (6.8-fold), transforming growth factor beta-stimulated protein (TSC22D1) (6.84-fold), MAX binding protein (MNT) (2.44-fold), and RAD50 (4.24-fold). Collectively, these results indicate that these chemicals target different genes and molecular pathways involved in the regulatory processes controlling NHEK proliferation and differentiation. Mechanistic studies with a subset of genes may allow the correlation of alterations in these molecular markers with chemical-specific blocks to differentiation in NHEK.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.