Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
Data integration is key to knowledge discovery in the age of getiomics and represents a major, longstanding challenge for the genome informatics community. Integration of data across heterogeneous genome databases requires the identification of common data entities a d mechanisms to ensure referential integrity and persistence of these common entities even as our understanding of their biological properties changes. The Mouse Genome Informatics (MGI) database group employs multiple strategies for achieving data integration and maintaining inter-connectedness with other databases including 1 ) the use of permanent, unique accession ids for identihing core data entities, 2 ) the application of nomenclature s t a h r h for naming genes and strains of mice, and 3 ) the development and implementation of controlled vocabularies and ontologies to ensure semantic consistency of biological concepts within and across model organism databases. 29 0-7695-0862-6/00 $10.00 0 2000 IEEE
The daily and seasonal movements of the walleye, Stizostedion vitreum, were studied by use of a VHF radio‐tag system for three seasons each year during 1973 and 1974 in Lake Bemidji, a large eutrophic lake in north central Minnesota. The walleye moved parallel to the shore and did not show a diel on‐shore and off‐shore movement pattern. Submerged projections on the lake bottom such as rock bars or spits tended to serve as boundaries of the ranges within which the test fish moved. Large movements of tagged fish coincided with extensive periods of heavy cloud cover and precipitation. Wind speed and direction also appeared to influence fish movement especially when the fetch was large. Marked daily changes in water temperature and dissolved oxygen concentration did not occur, and, thus, had little effect on the movement of test fish. There were slight seasonal changes in the extent of the daily movement of test fish, but the depth ranges shown by the tagged fish did not change greatly for each season.
The Mouse Genome Sequencing Consortium and the RIKEN Genome Exploration Research group have generated large sets of sequence data representing the mouse genome and transcriptome, respectively. These data provide a valuable foundation for genomic research. The challenges for the informatics community are how to integrate these data with the ever-expanding knowledge about the roles of genes and gene products in biological processes, and how to provide useful views to the scientific community. Public resources, such as the National Center for Biotechnology Information (NCBI; http://www.ncbi.nih.gov), and model organism databases, such as the Mouse Genome Informatics database (MGI; http://www.informatics.jax.org), maintain the primary data and provide connections between sequence and biology. In this paper, we describe how the partnership of MGI and NCBI LocusLink contributes to the integration of sequence and biology, especially in the context of the large-scale genome and transcriptome data now available for the laboratory mouse. In particular, we describe the methods and results of integration of 60,770 FANTOM2 mouse cDNAs with gene records in the databases of MGI and LocusLink.Large-scale sequencing and annotation efforts, such as the human and mouse genome sequencing initiatives (Lander et al 2001;Waterston et al. 2002), the RIKEN full-length enriched cDNA sequencing project (Kawai et al. 2001;Okazaki et al. 2002), and the Mammalian Gene Collection (MGC; Strausberg et al. 1999), have made publicly available a wealth of genomic and transcript information to support diverse research efforts related to understanding mammalian biology and disease. Now more than ever, users need easy access to integrated views of, and analysis tools for, high-quality information about mammalian genes and genomes. The challenge is to develop strategies for integrating these data with continually emerging knowledge about the function, variation, and regulation of genes and other genomic features. The collaboration between the Mouse Genome Informatics (MGI) group 8 and the National Center for Biotechnology Information's (NCBI) LocusLink and RefSeq groups (http:// www.ncbi.nih.gov/) exemplifies how coordinated efforts facilitate connectivity between sequence and biology in the mouse. The MGI resource provides highly integrated and curated views of genetic, genomic, and biological data for the laboratory mouse. LocusLink, through the Reference Sequence (RefSeq) project, connects biological information to the sequences of reference chromosomes, RNAs, and proteins. The MGI/LocusLink collaboration has its greatest impact through the mutual determination of the sequences which best define mouse genes. Once gene-to-sequence(s) connections are established, and associations to available information about the genes are made, the foundation is set for additional computation, curation, and Internet connectivity for the scientific community.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.