BackgroundSeveral studies have shown that genomes can be studied via a multifractal formalism. Recently, we used a multifractal approach to study the genetic information content of the Caenorhabditis elegans genome. Here we investigate the possibility that the human genome shows a similar behavior to that observed in the nematode.ResultsWe report here multifractality in the human genome sequence. This behavior correlates strongly on the presence of Alu elements and to a lesser extent on CpG islands and (G+C) content. In contrast, no or low relationship was found for LINE, MIR, MER, LTRs elements and DNA regions poor in genetic information. Gene function, cluster of orthologous genes, metabolic pathways, and exons tended to increase their frequencies with ranges of multifractality and large gene families were located in genomic regions with varied multifractality. Additionally, a multifractal map and classification for human chromosomes are proposed.ConclusionsBased on these findings, we propose a descriptive non-linear model for the structure of the human genome, with some biological implications. This model reveals 1) a multifractal regionalization where many regions coexist that are far from equilibrium and 2) this non-linear organization has significant molecular and medical genetic implications for understanding the role of Alu elements in genome stability and structure of the human genome. Given the role of Alu sequences in gene regulation, genetic diseases, human genetic diversity, adaptation and phylogenetic analyses, these quantifications are especially useful.
ABSTRACT. The Caenorhabditis elegans genome has several regular and irregular characteristics in its nucleotide composition; these are observed within and between chromosomes. To study these particularities, we carried out a multifractal analysis, which requires a large number of exponents to characterize scaling properties. We looked for a relationship between the genetic information content of the chromosomes and multifractal parameters and found less multifractality compared to the human genome. Differences in multifractality among chromosomes and in regions of chromosomes, and two group averages of chromosome regions were observed. All these differences were mainly dependent on differences in the contents of repetitive DNA. Based on these properties, we propose a nonlinear model for the structure of the C. elegans genome, with some biological implications. These results suggest that examining differences in multifractality is a viable approach for measuring local variations of genomic information contents along chromosomes. This approach could be extended to other genomes in order to characterize structural and functional regions of chromosomes.
The Genome-Wide Association Studies (GWAS) are essential to determine the genetic bases of either ecological or economic phenotypic variation across individuals within populations of model and non-model organisms. For this research question, current practice is the replication of the GWAS testing different parameters and models to validate the reproducibility of results. However, straightforward methodologies that manage both replication and tetraploid data are still missing. To solve this problem, we designed the MultiGWAS, a tool that does GWAS for diploid and tetraploid organisms by executing in parallel four software, two for polyploid data (GWASpoly and SHEsis) and two for diploids data (PLINK and TASSEL). MultiGWAS has several advantages. It runs either in the command line or in an interface. It manages different genotype formats, including VCF. It executes both the full and naive models using several quality filters. Besides, it calculates a score to choose the best gene action model across GWASPoly and TASSEL. Finally, it generates several reports that facilitate the identification of false associations from both the significant and the best-ranked association SNP among the four software. We tested MultiGWAS with tetraploid potato data. The execution demonstrated that the Venn diagram and the other companion reports (i.e., Manhattan and QQ plots, heatmaps for associated SNP profiles, and chord diagrams to trace associated SNP by chromosomes) were useful to identify associated SNP shared among different models and parameters. Therefore, we confirmed that MultiGWAS is a suitable wrapping tool that successfully handles GWAS replication in both diploid and tetraploid organisms.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
The Genome-Wide Association Studies (GWAS) are essential to determine the genetic bases of either ecological or economic phenotypic variation across individuals within populations of model and non-model organisms. For this research question, current practice is the replication of the GWAS testing different parameters and models to validate the reproducibility of results. However, straightforward methodologies that manage both replication and tetraploid data are still missing. To solve this problem, we designed the MultiGWAS, a tool that does GWAS for diploid and tetraploid organisms by executing in parallel four software, two for polyploid data (GWASpoly and SHEsis) and two for diploids data (PLINK and TASSEL). MultiGWAS has several advantages. It runs either in the command line or in an interface. It manages different genotype formats, including VCF. It executes both the full and naïve models using several quality filters. Besides, it calculates a score to choose the best gene action model across GWASPoly and TASSEL. Finally, it generates several reports that facilitate the identification of false associations from both the significant and the best-ranked association SNP among the four software. We tested MultiGWAS with tetraploid potato data. The execution demonstrated that the Venn diagram and the other companion reports (i.e., Manhattan and QQ plots, heatmaps for associated SNP profiles, and chord diagrams to trace associated SNP by chromosomes) were useful to identify associated SNP shared among different models and parameters. Therefore, we confirmed that MultiGWAS is a suitable wrapping tool that successfully handles GWAS replication in both diploid and tetraploid organisms. Hosted fileMultiGWASv14.pdf available at https://authorea.com/users/358323/articles/480510-multigwas-anintegrative-tool-for-genome-wide-association-studies-gwas-in-tetraploid-organisms
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.