The plant Arabidopsis thaliana occurs naturally in many different habitats throughout Eurasia. As a foundation for identifying genetic variation contributing to adaptation to diverse environments, a 1001 Genomes Project to sequence geographically diverse A. thaliana strains has been initiated. Here we present the first phase of this project, based on population-scale sequencing of 80 strains drawn from eight regions throughout the species' native range. We describe the majority of common small-scale polymorphisms as well as many larger insertions and deletions in the A. thaliana pan-genome, their effects on gene function, and the patterns of local and global linkage among these variants. The action of processes other than spontaneous mutation is identified by comparing the spectrum of mutations that have accumulated since A. thaliana diverged from its closest relative 10 million years ago with the spectrum observed in the laboratory. Recent species-wide selective sweeps are rare, and potentially deleterious mutations are more common in marginal populations
The Arabidopsis thaliana transcription factor APETALA2 (AP2) has numerous functions, including roles in seed development, stem cell maintenance, and specification of floral organ identity. To understand the relationship between these different roles, we mapped direct targets of AP2 on a genome-wide scale in two tissue types. We find that AP2 binds to thousands of loci in the developing flower, many of which exhibit AP2-dependent transcription. Opposing, logical effects are evident in AP2 binding to two microRNA genes that influence AP2 expression, with AP2 positively regulating miR156 and negatively regulating miR172, forming a complex direct feedback loop, which also included all but one of the AP2-like miR172 target clade members. We compare the genome-wide direct target repertoire of AP2 with that of SCHLAFMÜ TZE, a closely related transcription factor that also represses the transition to flowering. We detect clear similarities and important differences in the direct target repertoires that are also tissue specific. Finally, using an inducible expression system, we demonstrate that AP2 has dual molecular roles. It functions as both a transcriptional activator and repressor, directly inducing the expression of the floral repressor AGAMOUS-LIKE15 and directly repressing the transcription of floral activators like SUPPRESSOR OF OVEREXPRESSION OF CONSTANS1.
The appropriate timing of flowering is crucial for plant reproductive success. It is therefore not surprising that intricate genetic networks have evolved to perceive and integrate both endogenous and environmental signals, such as carbohydrate and hormonal status, photoperiod and temperature. In contrast to our detailed understanding of the vernalization pathway, little is known about how flowering time is controlled in response to changes in the ambient growth temperature. In Arabidopsis thaliana, the MADS-box transcription factor genes FLOWERING LOCUS M (FLM) and SHORT VEGETATIVE PHASE (SVP) have key roles in this process. FLM is subject to temperature-dependent alternative splicing. Here we report that the two main FLM protein splice variants, FLM-β and FLM-δ, compete for interaction with the floral repressor SVP. The SVP-FLM-β complex is predominately formed at low temperatures and prevents precocious flowering. By contrast, the competing SVP-FLM-δ complex is impaired in DNA binding and acts as a dominant-negative activator of flowering at higher temperatures. Our results show a new mechanism that controls the timing of the floral transition in response to changes in ambient temperature. A better understanding of how temperature controls the molecular mechanisms of flowering will be important to cope with current changes in global climate.
MicroRNAs (miRNAs) are processed from primary transcripts that contain partially self-complementary foldbacks. As in animals, the core microprocessor in plants is a Dicer protein, DICER-LIKE1 (DCL1). Processing accuracy and strand selection is greatly enhanced through the RNA binding protein HYPONASTIC LEAVES 1 (HYL1) and the zinc finger protein SERRATE (SE). We have combined a luciferase-based genetic screen with whole-genome sequencing for rapid identification of new regulators of miRNA biogenesis and action. Among the first six mutants analyzed were three alleles of C-TERMINAL DOMAIN PHOSPHATASE-LIKE 1 (CPL1)/FIERY2 (FRY2). In the miRNA processing complex, SE functions as a scaffold to mediate CPL1 interaction with HYL1, which needs to be dephosphorylated for optimal activity. In the absence of CPL1, HYL1 dephosphorylation and hence accurate processing and strand selection from miRNA duplexes are compromised. Our findings thus define a new regulatory step in plant miRNA biogenesis.
Transposable elements (TEs) are often the primary determinant of genome size differences among eukaryotes. In plants, the proliferation of TEs is countered through epigenetic silencing mechanisms that prevent mobility. Recent studies using the model plant Arabidopsis thaliana have revealed that methylated TE insertions are often associated with reduced expression of nearby genes, and these insertions may be subject to purifying selection due to this effect. Less is known about the genome-wide patterns of epigenetic silencing of TEs in other plant species. Here, we compare the 24-nt siRNA complement from A. thaliana and a closely related congener with a two-to threefold higher TE copy number, Arabidopsis lyrata. We show that TEs-particularly siRNA-targeted TEs -are associated with reduced gene expression within both species and also with gene expression differences between orthologs. In addition, A. lyrata TEs are targeted by a lower fraction of uniquely matching siRNAs, which are associated with more effective silencing of TE expression. Our results suggest that the efficacy of RNA-directed DNA methylation silencing is lower in A. lyrata, a finding that may shed light on the causes of differential TE proliferation among species.gene silencing | transposons
BackgroundThe new research field of metagenomics is providing exciting insights into various, previously unclassified ecological systems. Next-generation sequencing technologies are producing a rapid increase of environmental data in public databases. There is great need for specialized software solutions and statistical methods for dealing with complex metagenome data sets.Methodology/Principal FindingsTo facilitate the development and improvement of metagenomic tools and the planning of metagenomic projects, we introduce a sequencing simulator called MetaSim. Our software can be used to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets. Based on a database of given genomes, the program allows the user to design a metagenome by specifying the number of genomes present at different levels of the NCBI taxonomy, and then to collect reads from the metagenome using a simulation of a number of different sequencing technologies. A population sampler optionally produces evolved sequences based on source genomes and a given evolutionary tree.Conclusions/SignificanceMetaSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software.
We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through http://1001genomes.org/projects/assemblies.html.
Despite great advances in sequencing technologies, generating functional information for nonmodel organisms remains a challenge. One solution lies in an improved ability to predict genetic circuits based on primary DNA sequence in combination with detailed knowledge of regulatory proteins that have been characterized in model species. Here, we focus on the LEAFY (LFY) transcription factor, a conserved master regulator of floral development. Starting with biochemical and structural information, we built a biophysical model describing LFY DNA binding specificity in vitro that accurately predicts in vivo LFY binding sites in the Arabidopsis thaliana genome. Applying the model to other plant species, we could follow the evolution of the regulatory relationship between LFY and the AGAMOUS (AG) subfamily of MADS box genes and show that this link predates the divergence between monocots and eudicots. Remarkably, our model succeeds in detecting the connection between LFY and AG homologs despite extensive variation in binding sites. This demonstrates that the ciselement fluidity recently observed in animals also exists in plants, but the challenges it poses can be overcome with predictions grounded in a biophysical model. Therefore, our work opens new avenues to deduce the structure of regulatory networks from mere inspection of genomic sequences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.