Recent studies have shown that one of the parental subgenomes in ancient polyploids is generally more dominant, having retained more genes and being more highly expressed, a phenomenon termed subgenome dominance. The genomic features that determine how quickly and which subgenome dominates within a newly formed polyploid remain poorly understood. To investigate the rate of emergence of subgenome dominance, we examined gene expression, gene methylation, and transposable element (TE) methylation in a natural, <140-year-old allopolyploid (Mimulus peregrinus), a resynthesized interspecies triploid hybrid (M. robertsii), a resynthesized allopolyploid (M. peregrinus), and progenitor species (M. guttatus and M. luteus). We show that subgenome expression dominance occurs instantly following the hybridization of divergent genomes and significantly increases over generations. Additionally, CHH methylation levels are reduced in regions near genes and within TEs in the first-generation hybrid, intermediate in the resynthesized allopolyploid, and are repatterned differently between the dominant and recessive subgenomes in the natural allopolyploid. Subgenome differences in levels of TE methylation mirror the increase in expression bias observed over the generations following hybridization. These findings provide important insights into genomic and epigenomic shock that occurs following hybridization and polyploid events and may also contribute to uncovering the mechanistic basis of heterosis and subgenome dominance.
Deep learning methodologies have revolutionized prediction in many fields and show potential to do the same in molecular biology and genetics. However, applying these methods in their current forms ignores evolutionary dependencies within biological systems and can result in false positives and spurious conclusions. We developed two approaches that account for evolutionary relatedness in machine learning models: (i) gene-family-guided splitting and (ii) ortholog contrasts. The first approach accounts for evolution by constraining model training and testing sets to include different gene families. The second approach uses evolutionarily informed comparisons between orthologous genes to both control for and leverage evolutionary divergence during the training process. The two approaches were explored and validated within the context of mRNA expression level prediction and have the area under the ROC curve (auROC) values ranging from 0.75 to 0.94. Model weight inspections showed biologically interpretable patterns, resulting in the hypothesis that the 3′ UTR is more important for fine-tuning mRNA abundance levels while the 5′ UTR is more important for largescale changes. machine learning | convolutional neural networks | regulation | RNA M achine and deep learning approaches such as Convolutional Neural Networks (CNNs) are largely responsible for a recent paradigm shift in image and natural language processing. These approaches are among the fundamental enablers of modern artificial intelligence advances such as facial recognition, speech recognition, and self-driving vehicles. The same deep learning approaches are beginning to be applied to molecular biology, genetics, agriculture, and medicine (1-7), but evolutionary relationships make properly training and testing models in biology much more challenging than the image or text classification problems mentioned above.For example, if one wants to predict mRNA levels from DNA promoter regions (as we do here), the standard approach from image recognition problems would be to randomly split genes into training and testing sets (8). However, such a split will likely lead to dependencies between the sets because of shared evolutionary histories between genes (i.e., gene family relatedness, gene duplications, etc.) and may cause model overfitting and falsepositive spurious conclusions. Models trained without properly accounting for the constraints imposed by evolutionary history (and perhaps other biological and technical factors specific to the modeling scenario) will likely memorize both the neutral and the functional evolutionary history, rather than learning only the functional elements, leading researchers to incorrect conclusions.With these challenges in mind, we developed two CNN architectures for predicting mRNA expression levels from DNA promoter and/or terminator regions. These include models that predict the following: (i) if a given gene is highly or lowly expressed and (ii) which of two compared gene orthologs has higher mRNA abundance. The architectures are ...
Although the sequence of evolutionary events that produced multiple C4 subtypes within the Paniceae remains undetermined, the results presented here are consistent with only a subset of currently proposed models. The species used in this study constitute a panel of C3 and C4 grasses that are suitable for further studies on C4 photosynthesis, bioenergy, food and forage crops, and various developmental features of the Paniceae.
The past few years have witnessed a paradigm shift in molecular systematics from phylogenetic methods (using one or a few genes) to those that can be described as phylogenomics (phylogenetic inference with entire genomes). One approach that has recently emerged is phylo-transcriptomics (transcriptome-based phylogenetic inference). As in any phylogenetics experiment, accurate orthology inference is critical to phylo-transcriptomics. To date, most analyses have inferred orthology based either on pure sequence similarity or using gene-tree approaches. The use of conserved genome synteny in orthology detection has been relatively under-employed in phylogenetics, mainly due to the cost of sequencing genomes. While current trends focus on the quantity of genes included in an analysis, the use of synteny is likely to improve the quality of ortholog inference. In this study, we combine de novo transcriptome data and sequenced genomes from an economically important group of grass species, the tribe Paniceae, to make phylogenomic inferences. This method, which we call “genome-guided phylo-transcriptomics”, is compared to other recently published orthology inference pipelines, and benchmarked using a set of sequenced genomes from across the grasses. These comparisons provide a framework for future researchers to evaluate the costs and benefits of adding sequenced genomes to transcriptome data sets.
Premise Whole‐genome duplications (WGDs) are prevalent throughout the evolutionary history of plants. For example, dozens of WGDs have been phylogenetically localized across the order Brassicales, specifically, within the family Brassicaceae. A WGD event has also been identified in the Cleomaceae, the sister family to Brassicaceae, yet its placement, as well as that of WGDs in other families in the order, remains unclear. Methods Phylo‐transcriptomic data were generated and used to infer a nuclear phylogeny for 74 Brassicales taxa. Genome survey sequencing was also performed on 66 of those taxa to infer a chloroplast phylogeny. These phylogenies were used to assess and confirm relationships among the major families of the Brassicales and within Brassicaceae. Multiple WGD inference methods were then used to assess the placement of WGDs on the nuclear phylogeny. Results Well‐supported chloroplast and nuclear phylogenies for the Brassicales and the putative placement of the Cleomaceae‐specific WGD event Th‐ɑ are presented. This work also provides evidence for previously hypothesized WGDs, including a well‐supported event shared by at least two members of the Resedaceae family, and a possible event within the Capparaceae. Conclusions Phylogenetics and the placement of WGDs within highly polyploid lineages continues to be a major challenge. This study adds to the conversation on WGD inference difficulties by demonstrating that sampling is especially important for WGD identification and phylogenetic placement. Given its economic importance and genomic resources, the Brassicales continues to be an ideal group for assessing WGD inference methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.