Allison Piovesan scite author profile

Objective Basic parameters commonly used to describe genomes including length, weight and relative guanine-cytosine (GC) content are widely cited in absence of a primary source. By using updated data and original software we determined these values to the best of our knowledge as standard reference for the whole human nuclear genome, for each chromosome and for mitochondrial DNA. We also devised a method to calculate the relative GC content in the whole messenger RNA sequence set and in transcriptomes by multiplying the GC content of each gene by its mean expression level. Results The male nuclear diploid genome extends for 6.27 Gigabase pairs (Gbp), is 205.00 cm (cm) long and weighs 6.41 picograms (pg). Female values are 6.37 Gbp, 208.23 cm, 6.51 pg. The individual variability and the implication for the DNA informational density in terms of bits/volume were discussed. The genomic GC content is 40.9%. Following analysis in different transcriptomes and species, we showed that the greatest deviation was observed in the pathological condition analysed (trisomy 21 leukaemic cells) and in Caenorhabditis elegans . Our results may represent a solid basis for further investigation on human structural and functional genomics while also providing a framework for other genome comparative analysis. Electronic supplementary material The online version of this article (10.1186/s13104-019-4137-z) contains supplementary material, which is available to authorized users.

show abstract

Human protein-coding genes and gene feature statistics in 2019

Piovesan

et al. 2019

View full text Add to dashboard Cite

Objective A well-known limit of genome browsers is that the large amount of genome and gene data is not organized in the form of a searchable database, hampering full management of numerical data and free calculations. Due to the continuous increase of data deposited in genomic repositories, their content revision and analysis is recommended. Using GeneBase, a software with a graphical interface able to import and elaborate National Center for Biotechnology Information (NCBI) Gene database entries, we provide tabulated spreadsheets updated to 2019 about human nuclear protein-coding gene data set ready to be used for any type of analysis about genes, transcripts and gene organization. Results Comparison with previous reports reveals substantial change in the number of known nuclear protein-coding genes (now 19,116), the protein-coding non-redundant transcriptome space [now 59,281,518 base pair (bp), 10.1% increase], the number of exons (now 562,164, 36.2% increase) due to a relevant increase of the RNA isoforms recorded. Other parameters such as gene, exon or intron mean and extreme length appear to have reached a stability that is unlikely to be substantially modified by human genome data updates, at least regarding protein-coding genes. Finally, we confirm that there are no human introns shorter than 30 bp.

show abstract

Extensive microRNA-mediated crosstalk between lncRNAs and mRNAs in mouse embryonic stem cells

et al. 2015

View full text Add to dashboard Cite

Recently, a handful of intergenic long noncoding RNAs (lncRNAs) have been shown to compete with mRNAs for binding to miRNAs and to contribute to development and disease. Beyond these reports, little is yet known of the extent and functional consequences of miRNA-mediated regulation of mRNA levels by lncRNAs. To gain further insight into lncRNA-mRNA miRNA-mediated crosstalk, we reanalyzed transcriptome-wide changes induced by the targeted knockdown of over 100 lncRNA transcripts in mouse embryonic stem cells (mESCs). We predicted that, on average, almost one-fifth of the transcript level changes induced by lncRNAs are dependent on miRNAs that are highly abundant in mESCs. We validated these findings experimentally by temporally profiling transcriptome-wide changes in gene expression following the loss of miRNA biogenesis in mESCs. Following the depletion of miRNAs, we found that >50% of lncRNAs and their miRNA-dependent mRNA targets were up-regulated coordinately, consistent with their interaction being miRNA-mediated. These lncRNAs are preferentially located in the cytoplasm, and the response elements for miRNAs they share with their targets have been preserved in mammals by purifying selection. Lastly, miRNA-dependent mRNA targets of each lncRNA tended to share common biological functions. Post-transcriptional miRNA-mediated crosstalk between lncRNAs and mRNA, in mESCs, is thus surprisingly prevalent, conserved in mammals, and likely to contribute to critical developmental processes.

show abstract

GeneBase 1.1: a tool to summarize data from NCBI gene datasets and its application to an update of human gene statistics

et al. 2016

View full text Add to dashboard Cite

We release GeneBase 1.1, a local tool with a graphical interface useful for parsing, structuring and indexing data from the National Center for Biotechnology Information (NCBI) Gene data bank. Compared to its predecessor GeneBase (1.0), GeneBase 1.1 now allows dynamic calculation and summarization in terms of median, mean, standard deviation and total for many quantitative parameters associated with genes, gene transcripts and gene features (exons, introns, coding sequences, untranslated regions). GeneBase 1.1 thus offers the opportunity to perform analyses of the main gene structure parameters also following the search for any set of genes with the desired characteristics, allowing unique functionalities not provided by the NCBI Gene itself. In order to show the potential of our tool for local parsing, structuring and dynamic summarizing of publicly available databases for data retrieval, analysis and testing of biological hypotheses, we provide as a sample application a revised set of statistics for human nuclear genes, gene transcripts and gene features. In contrast with previous estimations strongly underestimating the length of human genes, a ‘mean’ human protein-coding gene is 67 kbp long, has eleven 309 bp long exons and ten 6355 bp long introns. Median, mean and extreme values are provided for many other features offering an updated reference source for human genome studies, data useful to set parameters for bioinformatic tools and interesting clues to the biomedical meaning of the gene features themselves.Database URL: http://apollo11.isto.unibo.it/software/

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.