Recent years have seen a surge in plant genome sequencing projects and the comparison of multiple related individuals. The high degree of genomic variation observed led to the realisation that single reference genomes do not represent the diversity within a species, and led to the expansion of the pan-genome concept. Pan-genomes represent the genomic diversity of a species and includes core genes, found in all individuals, as well as variable genes which are absent in some individuals. Variable gene annotations often show similarities across plant species, with genes for biotic and abiotic stress commonly enriched within variable gene groups. Here we review the growth of pan-genomics in plants, explore the origins of gene presence/absence variation and show how pan-genomes can support plant breeding and evolution studies. Pan-genomes in plants: beginnings and current statusThe concept of pan-genomes was first developed in bacteria in 2005 1 , where the sequencing of several isolates of Streptococcus agalactiae revealed a core genome represented by 80% of S. agalactiae genes, with the other 20% being absent in at least one isolate 1 . However, it took almost 10 years for plant pan-genomes to be constructed after the initial bacterial pangenome work. This was partially due to the expense of data generation, but also the expectation that there would be very little gene presence/absence variation (PAV) in higher organisms which do not exchange genetic material as freely as bacteria 2 . The first publication to apply the term pan-genome in plants appeared in 2007, where it described short variable regions in the rice and maize genomes 3 . However, the extent of gene presence/absence was not understood at that time due to lack of accurate whole genome assemblies for multiple individuals of the same species. However, as DNA sequencing costs declined, it became feasible to undertake whole genome comparisons within species, and three general approaches for pan-genome assembly were developed 4,5 (Figure 1). The first method developed was the whole genome assembly and comparison, where the genomes of multiple individuals are assembled and then compared. This was later complemented by the iterative assembly and presence/absence variation calling approach, where genomic reads from multiple individuals are aligned to a reference, and non-aligning reads assembled and added to the growing pan-genome reference. Subsequent remapping of all reads to the pan-genome permits PAV calling across the population. More recently there have been rapid developments in graph based pan-genome assembly, where a graph representing genomic diversity and conservation is constructed 6 .
SummaryIn the last decade, the revolution in sequencing technologies has deeply impacted crop genotyping practice. New methods allowing rapid, high‐throughput genotyping of entire crop populations have proliferated and opened the door to wider use of molecular tools in plant breeding. These new genotyping‐by‐sequencing (GBS) methods include over a dozen reduced‐representation sequencing (RRS) approaches and at least four whole‐genome resequencing (WGR) approaches. The diversity of methods available, each often producing different types of data at different cost, can make selection of the best‐suited method seem a daunting task. We review the most common genotyping methods used today and compare their suitability for linkage mapping, genomewide association studies (GWAS), marker‐assisted and genomic selection and genome assembly and improvement in crops with various genome sizes and complexity. Furthermore, we give an outline of bioinformatics tools for analysis of genotyping data. WGR is well suited to genotyping biparental cross populations with complex, small‐ to moderate‐sized genomes and provides the lowest cost per marker data point. RRS approaches differ in their suitability for various tasks, but demonstrate similar costs per marker data point. These approaches are generally better suited for de novo applications and more cost‐effective when genotyping populations with large genomes or high heterozygosity. We expect that although RRS approaches will remain the most cost‐effective for some time, WGR will become more widespread for crop genotyping as sequencing costs continue to decrease.
Contents 682I.682II.683III.684IV.685V.685VI.688VII.690VIII.694694References694 Summary With the rapid increase in the global population and the impact of climate change on agriculture, there is a need for crops with higher yields and greater tolerance to abiotic stress. However, traditional crop improvement via genetic recombination or random mutagenesis is a laborious process and cannot keep pace with increasing crop demand. Genome editing technologies such as clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR‐associated protein (CRISPR/Cas) allow targeted modification of almost any crop genome sequence to generate novel variation and accelerate breeding efforts. We expect a gradual shift in crop improvement away from traditional breeding towards cycles of targeted genome editing. Crop improvement using genome editing is not constrained by limited existing variation or the requirement to select alleles over multiple breeding generations. However, current applications of crop genome editing are limited by the lack of complete reference genomes, the sparse knowledge of potential modification targets, and the unclear legal status of edited crops. We argue that overcoming technical and social barriers to the application of genome editing will allow this technology to produce a new generation of high‐yielding, climate ready crops.
The reconstruction of reticulate evolutionary histories in plants is still a major methodological challenge. Sequences of the ITS nrDNA are a popular marker to analyze hybrid relationships, but variation of this multicopy spacer region is affected by concerted evolution, high intraindividual polymorphism, and shifts in mode of reproduction. The relevance of changes in secondary structure is still under dispute. We aim to shed light on the extent of polymorphism within and between sexual species and their putative natural as well as synthetic hybrid derivatives in the Ranunculus auricomus complex to test morphology-based hypotheses of hybrid origin and parentage of taxa. We employed direct sequencing of ITS nrDNA from 68 individuals representing three sexuals, their synthetic hybrids and one sympatric natural apomict, as well as cloning of ITS copies in four representative individuals, RNA secondary structure analysis, and landmark geometric morphometric analysis on leaves. Phylogenetic network analyses indicate additivity of parental ITS variants in both synthetic and natural hybrids. The triploid synthetic hybrids are genetically much closer to their maternal progenitors, probably due to ploidy dosage effects, although exhibiting a paternal-like leaf morphology. The natural hybrids are genetically and morphologically closer to the putative paternal progenitor species. Secondary structures of ITS1-5.8S-ITS2 were rather conserved in all taxa. The observed similarities in ITS polymorphisms suggest that the natural apomict R. variabilis is an ancient hybrid of the diploid sexual species R. notabilis and the sexual species R. cassubicifolius. The additivity pattern shared by R. variabilis and the synthetic hybrids supports an evolutionary and biogeographical scenario that R. variabilis originated from ancient hybridization. Concerted evolution of ITS copies in R. variabilis is incomplete, probably due to a shift to asexual reproduction. Under the condition of comprehensive inter- and intraspecific sampling, ITS polymorphisms are powerful for elucidating reticulate evolutionary histories.
Domestication and breeding have reshaped the genomic architecture of chicken, but the retention and loss of genomic elements during these evolutionary processes remain unclear. We present the first chicken pan-genome constructed using 664 individuals, which identified an additional ∼66.5 Mb sequences that are absent from the reference genome (GRCg6a). The constructed pan-genome encoded 20,491 predicated protein-coding genes, of which higher expression level are observed in conserved genes relative to dispensable genes. Presence/absence variation (PAV) analyses demonstrated that gene PAV in chicken was shaped by selection, genetic drift, and hybridization. PAV-based GWAS identified numerous candidate mutations related to growth, carcass composition, meat quality, or physiological traits. Among them, a deletion in the promoter region of IGF2BP1 affecting chicken body size is reported, which is supported by functional studies and extra samples. This is the first time to report the causal variant of chicken body size QTL located at chromosome 27 which was repeatedly reported. Therefore, the chicken pan-genome is a useful resource for biological discovery and breeding. It improves our understanding of chicken genome diversity and provides materials to unveil the evolution history of chicken domestication.
Climate change is impacting ecosystems globally (Pecl et al., 2017) with increasing temperature and extreme climatic events expected to become more frequent, widespread and persistent through the 21st century (Oliver et al., 2019). In many circumstances, climate change is outpacing the ability of species to adapt, causing mortality, range shifts and new ecosystem states (
Summary Plant genomes demonstrate significant presence/absence variation (PAV) within a species; however, the factors that lead to this variation have not been studied systematically in Brassica across diploids and polyploids. Here, we developed pangenomes of polyploid Brassica napus and its two diploid progenitor genomes B. rapa and B. oleracea to infer how PAV may differ between diploids and polyploids. Modelling of gene loss suggests that loss propensity is primarily associated with transposable elements in the diploids while in B. napus, gene loss propensity is associated with homoeologous recombination. We use these results to gain insights into the different causes of gene loss, both in diploids and following polyploidization, and pave the way for the application of machine learning methods to understanding the underlying biological and physical causes of gene presence/absence.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.