Knowing the total cell number of the human body as well as of individual organs is important from a cultural, biological, medical and comparative modelling point of view. The presented cell count could be a starting point for a common effort to complete the total calculation.
Objective A well-known limit of genome browsers is that the large amount of genome and gene data is not organized in the form of a searchable database, hampering full management of numerical data and free calculations. Due to the continuous increase of data deposited in genomic repositories, their content revision and analysis is recommended. Using GeneBase, a software with a graphical interface able to import and elaborate National Center for Biotechnology Information (NCBI) Gene database entries, we provide tabulated spreadsheets updated to 2019 about human nuclear protein-coding gene data set ready to be used for any type of analysis about genes, transcripts and gene organization. Results Comparison with previous reports reveals substantial change in the number of known nuclear protein-coding genes (now 19,116), the protein-coding non-redundant transcriptome space [now 59,281,518 base pair (bp), 10.1% increase], the number of exons (now 562,164, 36.2% increase) due to a relevant increase of the RNA isoforms recorded. Other parameters such as gene, exon or intron mean and extreme length appear to have reached a stability that is unlikely to be substantially modified by human genome data updates, at least regarding protein-coding genes. Finally, we confirm that there are no human introns shorter than 30 bp.
Objective Basic parameters commonly used to describe genomes including length, weight and relative guanine-cytosine (GC) content are widely cited in absence of a primary source. By using updated data and original software we determined these values to the best of our knowledge as standard reference for the whole human nuclear genome, for each chromosome and for mitochondrial DNA. We also devised a method to calculate the relative GC content in the whole messenger RNA sequence set and in transcriptomes by multiplying the GC content of each gene by its mean expression level. Results The male nuclear diploid genome extends for 6.27 Gigabase pairs (Gbp), is 205.00 cm (cm) long and weighs 6.41 picograms (pg). Female values are 6.37 Gbp, 208.23 cm, 6.51 pg. The individual variability and the implication for the DNA informational density in terms of bits/volume were discussed. The genomic GC content is 40.9%. Following analysis in different transcriptomes and species, we showed that the greatest deviation was observed in the pathological condition analysed (trisomy 21 leukaemic cells) and in Caenorhabditis elegans . Our results may represent a solid basis for further investigation on human structural and functional genomics while also providing a framework for other genome comparative analysis. Electronic supplementary material The online version of this article (10.1186/s13104-019-4137-z) contains supplementary material, which is available to authorized users.
BackgroundSeveral tools have been developed to perform global gene expression profile data analysis, to search for specific chromosomal regions whose features meet defined criteria as well as to study neighbouring gene expression. However, most of these tools are tailored for a specific use in a particular context (e.g. they are species-specific, or limited to a particular data format) and they typically accept only gene lists as input.ResultsTRAM (Transcriptome Mapper) is a new general tool that allows the simple generation and analysis of quantitative transcriptome maps, starting from any source listing gene expression values for a given gene set (e.g. expression microarrays), implemented as a relational database. It includes a parser able to assign univocal and updated gene symbols to gene identifiers from different data sources. Moreover, TRAM is able to perform intra-sample and inter-sample data normalization, including an original variant of quantile normalization (scaled quantile), useful to normalize data from platforms with highly different numbers of investigated genes. When in 'Map' mode, the software generates a quantitative representation of the transcriptome of a sample (or of a pool of samples) and identifies if segments of defined lengths are over/under-expressed compared to the desired threshold. When in 'Cluster' mode, the software searches for a set of over/under-expressed consecutive genes. Statistical significance for all results is calculated with respect to genes localized on the same chromosome or to all genome genes. Transcriptome maps, showing differential expression between two sample groups, relative to two different biological conditions, may be easily generated. We present the results of a biological model test, based on a meta-analysis comparison between a sample pool of human CD34+ hematopoietic progenitor cells and a sample pool of megakaryocytic cells. Biologically relevant chromosomal segments and gene clusters with differential expression during the differentiation toward megakaryocyte were identified.ConclusionsTRAM is designed to create, and statistically analyze, quantitative transcriptome maps, based on gene expression data from multiple sources. The release includes FileMaker Pro database management runtime application and it is freely available at http://apollo11.isto.unibo.it/software/, along with preconfigured implementations for mapping of human, mouse and zebrafish transcriptomes.
A ‘Down Syndrome critical region’ (DSCR) sufficient to induce the most constant phenotypes of Down syndrome (DS) had been identified by studying partial (segmental) trisomy 21 (PT21) as an interval of 0.6–8.3 Mb within human chromosome 21 (Hsa21), although its existence was later questioned. We propose an innovative, systematic reanalysis of all described PT21 cases (from 1973 to 2015). In particular, we built an integrated, comparative map from 125 cases with or without DS fulfilling stringent cytogenetic and clinical criteria. The map allowed to define or exclude as candidates for DS fine Hsa21 sequence intervals, also integrating duplication copy number variants (CNVs) data. A highly restricted DSCR (HR-DSCR) of only 34 kb on distal 21q22.13 has been identified as the minimal region whose duplication is shared by all DS subjects and is absent in all non-DS subjects. Also being spared by any duplication CNV in healthy subjects, HR-DSCR is proposed as a candidate for the typical DS features, the intellectual disability and some facial phenotypes. HR-DSCR contains no known gene and has relevant homology only to the chimpanzee genome. Searching for HR-DSCR functional loci might become a priority for understanding the fundamental genotype-phenotype relationships in DS.
The ideal reference, or control, gene for the study of gene expression in a given organism should be expressed at a medium-high level for easy detection, should be expressed at a constant/stable level throughout different cell types and within the same cell type undergoing different treatments, and should maintain these features through as many different tissues of the organism. From a biological point of view, these theoretical requirements of an ideal reference gene appear to be best suited to housekeeping (HK) genes. Recent advancements in the quality and completeness of human expression microarray data and in their statistical analysis may provide new clues toward the quantitative standardization of human gene expression studies in biology and medicine, both cross- and within-tissue. The systematic approach used by the present study is based on the Transcriptome Mapper tool and exploits the automated reassignment of probes to corresponding genes, intra- and inter-sample normalization, elaboration and representation of gene expression values in linear form within an indexed and searchable database with a graphical interface recording quantitative levels of expression, expression variability and cross-tissue width of expression for more than 31,000 transcripts. The present study conducted a meta-analysis of a pool of 646 expression profile data sets from 54 different human tissues and identified actin γ 1 as the HK gene that best fits the combination of all the traditional criteria to be used as a reference gene for general use; two ribosomal protein genes, RPS18 and RPS27, and one aquaporin gene, POM121 transmembrane nucleporin C, were also identified. The present study provided a list of tissue- and organ-specific genes that may be most suited for the following individual tissues/organs: Adipose tissue, bone marrow, brain, heart, kidney, liver, lung, ovary, skeletal muscle and testis; and also provides in these cases a representative, quantitative portrait of the relative, typical gene-expression profile in the form of searchable database tables.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.