Genomes at the species level are dynamic, with genes present in every individual (core) and genes in a subset of individuals (dispensable) that collectively constitute the pan-genome. Using transcriptome sequencing of seedling RNA from 503 maize (Zea mays) inbred lines to characterize the maize pan-genome, we identified 8681 representative transcript assemblies (RTAs) with 16.4% expressed in all lines and 82.7% expressed in subsets of the lines. Interestingly, with linkage disequilibrium mapping, 76.7% of the RTAs with at least one single nucleotide polymorphism (SNP) could be mapped to a single genetic position, distributed primarily throughout the nonpericentromeric portion of the genome. Stepwise iterative clustering of RTAs suggests, within the context of the genotypes used in this study, that the maize genome is restricted and further sampling of seedling RNA within this germplasm base will result in minimal discovery. Genome-wide association studies based on SNPs and transcript abundance in the pan-genome revealed loci associated with the timing of the juvenile-to-adult vegetative and vegetative-to-reproductive developmental transitions, two traits important for fitness and adaptation. This study revealed the dynamic nature of the maize pan-genome and demonstrated that a substantial portion of variation may lie outside the single reference genome for a species.
We report de novo genome assemblies, transcriptomes, annotations, and methylomes for the 26 inbreds that serve as the founders for the maize nested association mapping population. The number of pan-genes in these diverse genomes exceeds 103,000, with approximately a third found across all genotypes. The results demonstrate that the ancient tetraploid character of maize continues to degrade by fractionation to the present day. Excellent contiguity over repeat arrays and complete annotation of centromeres revealed additional variation in major cytological landmarks. We show that combining structural variation with single-nucleotide polymorphisms can improve the power of quantitative mapping studies. We also document variation at the level of DNA methylation and demonstrate that unmethylated regions are enriched for cis-regulatory elements that contribute to phenotypic variation.
BackgroundSequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations.ResultsWe benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species.ConclusionsThe benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.
Comprehensive and systematic transcriptome profiling provides valuable insight into biological and developmental processes that occur throughout the life cycle of a plant. We have enhanced our previously published microarray-based gene atlas of maize (Zea mays L.) inbred B73 to now include 79 distinct replicated samples that have been interrogated using RNA sequencing (RNAseq). The current version of the atlas includes 50 original arraybased gene atlas samples, a time-course of 12 stalk and leaf samples postflowering, and an additional set of 17 samples from the maize seedling and adult root system. The entire dataset contains 4.6 billion mapped reads, with an average of 20.5 million mapped reads per biological replicate, allowing for detection of genes with lower transcript abundance. As the new root samples represent key additions to the previously examined tissues, we highlight insights into the root transcriptome, which is represented by 28,894 (73.2%) annotated genes in maize. Additionally, we observed remarkable expression differences across both the longitudinal (four zones) and radial gradients (cortical parenchyma and stele) of the primary root supported by fourfold differential expression of 9353 and 4728 genes, respectively. Among the latter were 1110 genes that encode transcription factors, some of which are orthologs of previously characterized transcription factors known to regulate root development in Arabidopsis thaliana (L.) Heynh., while most are novel, and represent attractive targets for reverse genetics approaches to determine their roles in this important organ. This comprehensive transcriptome dataset is a powerful tool toward understanding maize development, physiology, and phenotypic diversity.
Transposable elements (TEs) account for a large portion of the genome in many eukaryotic species. Despite their reputation as “junk” DNA or genomic parasites deleterious for the host, TEs have complex interactions with host genes and the potential to contribute to regulatory variation in gene expression. It has been hypothesized that TEs and genes they insert near may be transcriptionally activated in response to stress conditions. The maize genome, with many different types of TEs interspersed with genes, provides an ideal system to study the genome-wide influence of TEs on gene regulation. To analyze the magnitude of the TE effect on gene expression response to environmental changes, we profiled gene and TE transcript levels in maize seedlings exposed to a number of abiotic stresses. Many genes exhibit up- or down-regulation in response to these stress conditions. The analysis of TE families inserted within upstream regions of up-regulated genes revealed that between four and nine different TE families are associated with up-regulated gene expression in each of these stress conditions, affecting up to 20% of the genes up-regulated in response to abiotic stress, and as many as 33% of genes that are only expressed in response to stress. Expression of many of these same TE families also responds to the same stress conditions. The analysis of the stress-induced transcripts and proximity of the transposon to the gene suggests that these TEs may provide local enhancer activities that stimulate stress-responsive gene expression. Our data on allelic variation for insertions of several of these TEs show strong correlation between the presence of TE insertions and stress-responsive up-regulation of gene expression. Our findings suggest that TEs provide an important source of allelic regulatory variation in gene response to abiotic stress in maize.
20Sequencing technology and assembly algorithms have matured to the point that high-21 quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse 22 transposable elements (TEs) and allow for annotation of TEs. There are numerous methods for 23 each class of elements with unknown relative performance metrics. We benchmarked existing 24 programs based on a curated library of rice TEs. Using the most robust programs, we created a 25 comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a 26 condensed TE library for annotations of structurally intact and fragmented elements. EDTA is 27 open-source and freely available: https://github.com/oushujun/EDTA. 28 Keywords 29 Transposable element; Annotation; Genome; Benchmarking; Pipeline 30 31Long-read sequencing (e.g., PacBio and Oxford Nanopore) and assembly scaffolding 50 (e.g., Hi-C and BioNano) techniques have progressed rapidly within the last few years. These 51 innovations have been critical for high-quality assembly of the repetitive fraction of genomes. In 52 fact, Ou et al. [8] demonstrated that the assembly contiguity of repetitive sequences in recent 53 long-read assemblies is even better than traditional BAC-based reference genomes. With these 54 developments, inexpensive and high-quality assembly of an entire genome is now possible. 55Knowing where features (i.e., genes, TEs, etc.) exist in a genome assembly is important 56 4 information for using these assemblies for biological findings. However, unlike the relatively 57 straightforward and comprehensive pipelines established for gene annotation [9][10][11], current 58 methods for TE annotation can be piecemeal, inaccurate, and are highly specific to classes of 59 transposable elements. 60Transposable elements fall into two major classes. Class I elements, also known as 61 retrotransposons, use an RNA intermediate in their "copy and paste" mechanism of 62 transposition [12]. Class I elements can be further divided into long terminal repeat (LTR) 63 retrotransposons, as well as those that lack LTRs (non-LTRs), which include long interspersed 64 nuclear elements (LINEs), and short interspersed nuclear elements (SINEs). Structural features 65 of these elements can facilitate automated de novo annotation in a genome assembly. For 66 example, LTR elements have a 5-bp target site duplication (TSD), while non-LTRs have either 67 variable length TSDs or lack TSDs entirely, being instead associated with deletion of flanking 68 sequences upon insertion [13]. There are also standard terminal sequences associated with 69 LTR elements (i.e., 5'-TG…C/G/TA-3' for LTR-Copia and 5'-TG…CA-3' for LTR-Gypsy 70 elements), and non-LTRs often have a terminal poly-A tail at the 3' end of the element (see [14] 71 for a complete description of structural features of each superfamily). 72The second major class of TEs, Class II elements, also known as DNA transposons, use 73 a DNA intermediate in their "cut and paste" mechanism of transposition [15]. As with Class I 74...
Cultivated potato (Solanum tuberosum L.), a vegetatively propagated autotetraploid, has been bred for distinct market classes, including fresh market, pigmented, and processing varieties. Breeding efforts have relied on phenotypic selection of populations developed from intra- and intermarket class crosses and introgressions of wild and cultivated Solanum relatives. To retrospectively explore the effects of potato breeding at the genome level, we used 8303 single-nucleotide polymorphism markers to genotype a 250-line diversity panel composed of wild species, genetic stocks, and cultivated potato lines with release dates ranging from 1857 to 2011. Population structure analysis revealed four subpopulations within the panel, with cultivated potato lines grouping together and separate from wild species and genetic stocks. With pairwise kinship estimates clear separation between potato market classes was observed. Modern breeding efforts have scarcely changed the percentage of heterozygous loci or the frequency of homozygous, single-dose, and duplex loci on a genome level, despite concerted efforts by breeders. In contrast, clear selection in less than 50 years of breeding was observed for alleles in biosynthetic pathways important for market class-specific traits such as pigmentation and carbohydrate composition. Although improvement and diversification for distinct market classes was observed through whole-genome analysis of historic and current potato lines, an increased rate of gain from selection will be required to meet growing global food demands and challenges due to climate change. Understanding the genetic basis of diversification and trait improvement will allow for more rapid genome-guided improvement of potato in future breeding efforts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.