We are at the beginning of a genomic revolution in which all known species are planned to be sequenced. Accessing such data for comparative analyses is crucial in this new age of data-driven biology. Here, we introduce an improved version of DIAMOND that greatly exceeds previous search performances and harnesses supercomputing to perform tree-of-life scale protein alignments in hours, while matching the sensitivity of the gold standard BLASTP.
Animal and plant development starts with a constituting phase called embryogenesis, which evolved independently in both lineages. Comparative anatomy of vertebrate development--based on the Meckel-Serrès law and von Baer's laws of embryology from the early nineteenth century--shows that embryos from various taxa appear different in early stages, converge to a similar form during mid-embryogenesis, and again diverge in later stages. This morphogenetic series is known as the embryonic 'hourglass', and its bottleneck of high conservation in mid-embryogenesis is referred to as the phylotypic stage. Recent analyses in zebrafish and Drosophila embryos provided convincing molecular support for the hourglass model, because during the phylotypic stage the transcriptome was dominated by ancient genes and global gene expression profiles were reported to be most conserved. Although extensively explored in animals, an embryonic hourglass has not been reported in plants, which represent the second major kingdom in the tree of life that evolved embryogenesis. Here we provide phylotranscriptomic evidence for a molecular embryonic hourglass in Arabidopsis thaliana, using two complementary approaches. This is particularly significant because the possible absence of an hourglass based on morphological features in plants suggests that morphological and molecular patterns might be uncoupled. Together with the reported developmental hourglass patterns in animals, these findings indicate convergent evolution of the molecular hourglass and a conserved logic of embryogenesis across kingdoms.
Seed germination is a critical stage in the plant life cycle and the first step toward successful plant establishment. Therefore, understanding germination is of important ecological and agronomical relevance. Previous research revealed that different seed compartments (testa, endosperm, and embryo) control germination, but little is known about the underlying spatial and temporal transcriptome changes that lead to seed germination. We analyzed genome-wide expression in germinating Arabidopsis (Arabidopsis thaliana) seeds with both temporal and spatial detail and provide Web-accessible visualizations of the data reported (vseed.nottingham.ac.uk). We show the potential of this highresolution data set for the construction of meaningful coexpression networks, which provide insight into the genetic control of germination. The data set reveals two transcriptional phases during germination that are separated by testa rupture. The first phase is marked by large transcriptome changes as the seed switches from a dry, quiescent state to a hydrated and active state. At the end of this first transcriptional phase, the number of differentially expressed genes between consecutive time points drops. This increases again at testa rupture, the start of the second transcriptional phase. Transcriptome data indicate a role for mechano-induced signaling at this stage and subsequently highlight the fates of the endosperm and radicle: senescence and growth, respectively. Finally, using a phylotranscriptomic approach, we show that expression levels of evolutionarily young genes drop during the first transcriptional phase and increase during the second phase. Evolutionarily old genes show an opposite pattern, suggesting a more conserved transcriptome prior to the completion of germination.
The developmental hourglass model has been used to describe the morphological transitions of related species throughout embryogenesis. Recently, quantifiable approaches combining transcriptomic and evolutionary information provided novel evidence for the presence of a phylotranscriptomic hourglass pattern across kingdoms. As its biological function is unknown it remains speculative whether this pattern is functional or merely represents a nonfunctional evolutionary relic. The latter would seriously hamper future experimental approaches designed to test hypotheses regarding its function. Here, we address this question by generating transcriptome divergence index (TDI) profiles across embryogenesis of Danio rerio, Drosophila melanogaster, and Arabidopsis thaliana. To enable meaningful evaluation of the resulting patterns, we develop a statistical test that specifically assesses potential hourglass patterns. Based on this objective measure we find that two of these profiles follow a statistically significant hourglass pattern with the most conserved transcriptomes in the phylotypic periods. As the TDI considers only recent evolutionary signals, this indicates that the phylotranscriptomic hourglass pattern is not a rudiment but possibly actively maintained, implicating the existence of some linked biological function associated with embryogenesis in extant species.
Comparison is a fundamental method of scientific research leading to insights about the processes that generate similarity or dissimilarity. In statistical terms comparisons between probability functions are performed to infer connections, correlations, or relationships between objects or samples (Cha 2007). Most quantification methods rely on distance or similarity measures, but the right choice for each individual application is not always clear and sometimes poorly explored. The reason for this is partly that diverse measures are either implemented in different R packages with very different notations or are not implemented at all. Thus, a comprehensive framework implementing the most common similarity and distance measures using a uniform notation is still missing. The R (R Core Team 2018) package Philentropy aims to fill this gap by implementing forty-six fundamental distance and similarity measures (Cha 2007) for comparing probability functions. These comparisons between probability functions have their foundations in a broad range of scientific disciplines from mathematics to ecology. The aim of this package is to provide a comprehensive and computationally optimized base framework for clustering, classification, statistical inference, goodness-of-fit, non-parametric statistics, information theory, and machine learning tasks that are based on comparing univariate or multivariate probability functions. All functions are written in C++ and are integrated into the R package using the Rcpp Application Programming Interface (API) (Eddelbuettel 2013).
Retrotransposons have played an important role in the evolution of host genomes 1 , 2 . Their impact is mainly deduced from the composition of DNA sequences that have been fixed over evolutionary time 2 . Such studies provide important “snapshots” reflecting the historical activities of transposons but do not predict current transposition potential. We previously reported Sequence-Independent Retrotransposon Trapping (SIRT) as a method that, by identification of extrachromosomal linear DNA (eclDNA), revealed the presence of active LTR retrotransposons in Arabidopsis 3 . However, SIRT cannot be applied to large and transposon-rich genomes, as found in crop plants. We have developed an alternative approach named ALE-seq ( a mplification of L TR of e clDNAs followed by seq uencing) for such situations. ALE-seq reveals sequences of 5’ LTRs of eclDNAs after two-step amplification: in vitro transcription and subsequent reverse transcription. Using ALE-seq in rice, we detected eclDNAs for a novel Copia family LTR retrotransposon, Go-on , which is activated by heat stress. Sequencing of rice accessions revealed that Go-on has preferentially accumulated in indica rice grown at higher temperatures. Furthermore, ALE-seq applied to tomato fruits identified a developmentally regulated Gypsy family of retrotransposons. A bioinformatic pipeline adapted for ALE-seq data analyses is used for the direct and reference-free annotation of new, active retroelements. This pipeline allows assessment of LTR retrotransposon activities in organisms for which genomic sequences and/or reference genomes are either unavailable or of low quality.
MotivationRetrieval and reproducible functional annotation of genomic data are crucial in biology. However, the current poor usability and transparency of retrieval methods hinders reproducibility. Here we present an open source R package, biomartr, which provides a comprehensive easy-to-use framework for automating data retrieval and functional annotation for meta-genomic approaches. The functions of biomartr achieve a high degree of clarity, transparency and reproducibility of analyses.ResultsThe biomartr package implements straightforward functions for bulk retrieval of all genomic data or data for selected genomes, proteomes, coding sequences and annotation files present in databases hosted by the National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EMBL-EBI). In addition, biomartr communicates with the BioMart database for functional annotation of retrieved sequences. Comprehensive documentation of biomartr functions and five tutorial vignettes provide step-by-step instructions on how to use the package in a reproducible manner.Availability and ImplementationThe open source biomartr package is available at https://github.com/HajkD/biomartr and https://cran.r-project.org/web/packages/biomartr/index.html.Supplementary information Supplementary data are available at Bioinformatics online.
Retrotransposons containing long terminal repeats (LTRs) form a substantial fraction of eukaryotic genomes. The timing of past transposition can be estimated by quantifying the accumulation of mutations in initially identical LTRs. This way, retrotransposons are divided into young, potentially mobile elements, and old that moved thousands or even millions of years ago. Both types are found within a single retrotransposon family and it is assumed that the old members will remain immobile and degenerate further. Here, we provide evidence in Arabidopsis that old members enter into replication/transposition cycles through high rates of intra-family recombination. The recombination occurs pairwise, resembling the formation of recombinant retroviruses. Thus, each transposition burst generates a novel progeny population of chromosomally integrated LTR retrotransposons consisting of pairwise recombination products produced in a process comparable the sexual exchange of genetic information. Our observations provide an explanation for the reported high rates of sequence diversification in retrotransposons.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.