Despite their importance in gene innovation and phenotypic variation, duplicated regions have remained largely intractable due to difficulties in accurately resolving their structure, copy number and sequence content. We present an algorithm (mrFAST) to comprehensively map next-generation sequence reads allowing for the prediction of absolute copy-number variation of duplicated segments and genes. We examine three human genomes and experimentally validate genome-wide copy-number differences. We estimate that 73–87 genes will be on average copy-number variable between two human genomes and find that these genic differences overwhelmingly correspond to segmental duplications (OR=135; p<2.2e-16). Our method can distinguish between different copies of highly identical genes, providing a more accurate census of gene content and insight into functional constraint without the limitations of array-based technology.
The diversity of microRNAs and small-interfering RNAs has been extensively explored within angiosperms by focusing on a few key organisms such as Oryza sativa and Arabidopsis thaliana. A deeper division of the plants is defined by the radiation of the angiosperms and gymnosperms, with the latter comprising the commercially important conifers. The conifers are expected to provide important information regarding the evolution of highly conserved small regulatory RNAs. Deep sequencing provides the means to characterize and quantitatively profile small RNAs in understudied organisms such as these. Pyrosequencing of small RNAs from O. sativa revealed, as expected, ∼21-and ∼24-nt RNAs. The former contained known microRNAs, and the latter largely comprised intergenic-derived sequences likely representing heterochromatin siRNAs. In contrast, sequences from Pinus contorta were dominated by 21-nt small RNAs. Using a novel sequence-based clustering algorithm, we identified sequences belonging to 18 highly conserved microRNA families in P. contorta as well as numerous clusters of conserved small RNAs of unknown function. Using multiple methods, including expressed sequence folding and machine learning algorithms, we found a further 53 candidate novel microRNA families, 51 appearing specific to the P. contorta library. In addition, alignment of small RNA sequences to the O. sativa genome revealed six perfectly conserved classes of small RNA that included chloroplast transcripts and specific types of genomic repeats. The conservation of microRNAs and other small RNAs between the conifers and the angiosperms indicates that important RNA silencing processes were highly developed in the earliest spermatophytes. Genomic mapping of all sequences to the O. sativa genome can be viewed at http://microrna.bcgsc.ca/cgi-bin/gbrowse/rice_build_3/.[Supplemental material is available online at www.genome.org.] . The heterochromatin siRNAs are a diverse set of 24-nt-long small RNAs that are processed by DCL3 from double-stranded RNA precursors produced by RDR2 (Xie et al. 2004). These RNAs are involved in heterochromatin formation and maintenance by directing sequencespecific DNA and histone methylation of transposable elements and some larger genomic loci (Pontier et al. 2005). Other 24-nt long siRNAs produced by DCL2 in A. thaliana can direct an initial cleavage of target transcripts, which are further cleaved into 21-nt siRNAs by DCL1 (Borsani et al. 2005). Finally, the trans-acting siRNAs (tasiRNAs), which are 21 nt long, are matured by a poorly understood mechanism involving DCL4. These tasiRNAs perform post-transcriptional gene silencing much like the miRNAs (Xie et al. 2004).Identification of functional small RNAs in other plant species has, until recently, been accomplished by searching for homologous sequences in expressed sequence data (Zhang et al. 2006a) and genomic sequences (Bonnet et al. 2004) and has been, with a few exceptions (Williams et al. 2005; TalmorNeiman et al. 2006), limited to the discovery of the more highly cons...
Wilson and King were among the first to recognize that the extent of phenotypic change between humans and great apes was dissonant with the rate of molecular change. Proteins are virtually identical 1,2 ; cytogenetically there are few rearrangements that distinguish ape-human chromosomes 3 ; rates of single-basepair change 4-7 and retroposon activity [8][9][10] have slowed particularly within hominid lineages when compared to rodents or monkeys. Here, we perform a systematic analysis of duplication content of four primate genomes (macaque, orangutan, chimpanzee and human) in an effort to understand the pattern and rates of genomic duplication during hominid evolution. We find that the ancestral branch leading to human and African great apes shows the most significant increase in duplication activity both in terms of basepairs and in terms of events. This duplication acceleration within the ancestral species is significant when compared to lineagespecific rate estimates even after accounting for copy-number polymorphism and homoplasy. We discover striking examples of recurrent and independent gene-containing duplications within the gorilla and chimpanzee that are absent in the human lineage. Our results suggest that the evolutionary properties of copy-number mutation differ significantly from other forms of genetic mutation and, in contrast to the hominid slowdown of single basepair mutations, there has been a genomic burst of duplication activity at this period during human evolution.We began by developing a segmental duplication map for each of the four primate genomes (macaque, orangutan, chimpanzee and human) (Fig. S1). The approach is based on the alignment of whole-genome shotgun (WGS) sequence data against the human reference genome and predicts high-identity segmental duplications (SDs) based on excess depth of Correspondence and request for materials should be addressed to: Evan E. Eichler (eee@gs.washington.edu) or Tomas Marques-Bonet (tmarques@u.washington.edu).. AUTHOR CONTRIBUTIONS EEE planned the project. MV and MC performed the FISH experiments. TAG, LWH, LAF, ERM and RKW generated the orangutan WGS sequences. TMB, JMK, ZC, ZJ, LC, EEE and SG analyzed the data. CB performed the ArrayCGH experiments. TMB, RMB and PS characterized the chr10 expansion. CA and GA generated the Venter/Watson comparative duplication maps. AN developed the maximum likelihood evolutionary model. TMB, JMK and EEE wrote the paper. Author information. Reprints and permissions information is available at www.nature.com/reprints. (Table 1, Table S1 and Supplementary Note Table 2). By this criterion, we characterized 73 Mbp corresponding to the duplications identified in at least one of the four primate species, correcting for copy number in each primate (Methods). We furthermore characterized each duplication as "lineage-specific" or "shared", depending on whether it was seen in only one or multiple genomes. This comparative map (Fig. S3, S4) is available as an interactive UCSC mirror browser, http://humanparalogy.gs.w...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.