This study describes comprehensive polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome. We identify the 5' and 3' boundaries of 181,047 transcripts with extensive variation in transcripts arising from alternative promoter usage, splicing, and polyadenylation. There are 16,247 new mouse protein-coding transcripts, including 5154 encoding previously unidentified proteins. Genomic mapping of the transcriptome reveals transcriptional forests, with overlapping transcription on both strands, separated by deserts in which few transcripts are observed. The data provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.
Two simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions are presented. Although they give no weights to different types of codon substitutions, these methods give essentially the same results as those obtained by Miyata and Yasunaga's and by Li et al.'s methods. Computer simulation indicates that estimates of synonymous substitutions obtained by the two methods are quite accurate unless the number of nucleotide substitutions per site is very large. It is shown that all available methods tend to give an underestimate of the number of nonsynonymous substitutions when the number is large.
Rice, one of the world's most important food plants, has important syntenic relationships with the other cereal species and is a model plant for the grasses. Here we present a map-based, finished quality sequence that covers 95% of the 389 Mb genome, including virtually all of the euchromatin and two complete centromeres. A total of 37,544 nontransposable-element-related protein-coding genes were identified, of which 71% had a putative homologue in Arabidopsis. In a reciprocal analysis, 90% of the Arabidopsis proteins had a putative homologue in the predicted rice proteome. Twenty-nine per cent of the 37,544 predicted genes appear in clustered gene families. The number and classes of transposable elements found in the rice genome are consistent with the expansion of syntenic regions in the maize and sorghum genomes. We find evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes. The map-based sequence has proven useful for the identification of genes underlying agronomic traits. The additional single-nucleotide polymorphisms and simple sequence repeats identified in our study should accelerate improvements in rice production.
The frequencies of each of the 257 468 complete protein coding sequences (CDSs) have been compiled from the taxonomical divisions of the GenBank DNA sequence database. The sum of the codons used by 8792 organisms has also been calculated. The data files can be obtained from the anonymous ftp sites of DDBJ, Kazusa and EBI. A list of the codon usage of genes and the sum of the codons used by each organism can be obtained through the web site http://www.kazusa.or.jp/codon/. The present study also reports recent developments on the WWW site. The new web interface provides data in the CodonFrequency-compatible format as well as in the traditional table format. The use of the database is facilitated by keyword based search analysis and the availability of codon usage tables for selected genes from each species. These new tools will provide users with the ability to further analyze for variations in codon usage among different genomes.
The freshwater cnidarian Hydra was first described in 17021 and has been the object of study for 300 years. Experimental studies of Hydra between 1736 and 1744 culminated in the discovery of asexual reproduction of an animal by budding, the first description of regeneration in an animal, and successful transplantation of tissue between animals2. Today, Hydra is an important model for studies of axial patterning3, stem cell biology4 and regeneration5. Here we report the genome of Hydra magnipapillata and compare it to the genomes of the anthozoan Nematostella vectensis6 and other animals. The Hydra genome has been shaped by bursts of transposable element expansion, horizontal gene transfer, trans-splicing, and simplification of gene structure and gene content that parallel simplification of the Hydra life cycle. We also report the sequence of the genome of a novel bacterium stably associated with H. magnipapillata. Comparisons of the Hydra genome to the genomes of other animals shed light on the evolution of epithelia, contractile tissues, developmentally regulated transcription factors, the Spemann–Mangold organizer, pluripotency genes and the neuromuscular junction.
Quinoa (Chenopodium quinoa Willd., 2n = 4x = 36) is a highly nutritious crop that is adapted to thrive in a wide range of agroecosystems. It was presumably first domesticated more than 7,000 years ago by pre-Columbian cultures and was known as the 'mother grain' of the Incan Empire 1 . Quinoa has adapted to the high plains of the Andean Altiplano (> 3,500 m above sea level), where it has developed tolerance to several abiotic stresses [2][3][4] . Quinoa has gained international attention because of the nutritional value of its seeds, which are gluten-free, have a low glycaemic index 5 , and contain an excellent balance of essential amino acids, fibre, lipids, carbohydrates, vitamins, and minerals 6 . Quinoa has the potential to provide a highly nutritious food source that can be grown on marginal lands not currently suitable for other major crops. This potential was recognized when the United Nations declared 2013 as the International Year of Quinoa, this being one of only three times a plant has received such a designation.Despite its agronomic potential, quinoa is still an underutilized crop 7 , with relatively few active breeding programs 8 . Breeding efforts to improve the crop for important agronomic traits are needed to expand quinoa production worldwide. To accelerate the improvement of quinoa, we present here the allotetraploid quinoa genome. We demonstrate the utility of the genome sequence by identifying a gene that probably regulates the presence of seed triterpenoid saponin content. Moreover, we sequenced the genomes of additional diploid and tetraploid Chenopodium species to characterize genetic diversity within the primary germplasm pool for quinoa and to understand sub-genome evolution in quinoa. Together, these resources provide the foundation for accelerating the genetic improvement of the crop, with the objective of enhancing global food security for a growing world population. Sequencing, assembly and annotationWe sequenced and assembled the genome of the coastal Chilean quinoa accession PI 614886 (BioSample accession code SAMN04338310) using single-molecule real-time (SMRT) sequencing technology from Pacific Biosciences (PacBio) and optical and chromosome-contact maps from BioNano Genomics 9 and Dovetail Genomics 10 . The assembly contains 3,486 scaffolds, with a scaffold N50 of 3.84 Mb and 90% of the assembled genome contained in 439 scaffolds (Table 1). The total assembly size of 1.39 gigabases (Gb) is similar to the reported size estimates of the quinoa genome (1.45-1.50 Gb (refs 11,12)). To combine scaffolds into pseudomolecules, an existing linkage map from quinoa 13 was integrated with two new linkage maps. The resulting map (Extended Data Fig. 1) of 6,403 unique markers spans a total length of 2,034 centimorgans (cM) and consists of 18 linkage groups (Supplementary Table 7), corresponding to the haploid chromosome number of quinoa. Pseudomolecules (hereafter referred to as chromosomes, which are numbered according to a previously published single-nucleotide polymorphism (SNP) linkage ...
Biosynthesis of anEven if a substituted amino acid were truly neutral in a functional sense, it is highly unlikely that factors involved in the synthesis of the protein would render the substitution neutral in the broader sense of the organism's integrated functioning. The substituted amino acid must be present within the cell in equivalent quantity compared with the original amino acid and, indeed, its synthesis or derivation from other molecules and transport into the cell require an equivalent amount of energy output.R. C. Richmond (1)
Retroviruses are the only group of viruses known to have left a fossil record, in the form of endogenous proviruses, and some of 8% of the human genome is made up of these elements 1,2 . Although many other viruses, including non-retroviral RNA viruses, are known to generate DNA forms of their own genomes during replication3 , 4, none has been found as DNA in the germline of animals. Bornaviruses, a nonsegmented, negative-sense RNA virus, are unique among RNA viruses in that they establish persistent infection in the cell nucleus5 -7 . Here we show that elements homologous to the nucleoprotein (N) gene of Bornavirus exist in the genomes of several mammalian species, including humans, non-human primates, rodents and elephants. These sequences have been designated endogenous Bornalike N (EBLN) elements. Some of the primate EBLNs contain an intact open reading frame (ORF) and are expressed as mRNA. Phylogenetic analyses revealed that EBLNs appear to have been generated by different insertional events in each specific animal family. Furthermore, the EBLN of a ground squirrel was formed by a recent integration event, while those in primates must have been formed more than 40 million years ago. We also show that the N mRNA of a current mammalian Bornavirus, Borna disease virus (BDV), can form EBLN-like elements in the genomes of persistently infected cultured cells. Our results provide the first evidence for endogenization of non-retroviral virus-derived elements in mammalian genomes and give †To whom correspondence should be addressed. Dr. Keizo Tomonaga:
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.