Here we present SMAP, a software package that implements a suite of computational tools to extract multi-allelic haplotypes using read-backed haplotyping. SMAP tools first perform accurate read processing and analyze read mapping distributions across sample sets. Then, two complementary modules can be invoked for haplotype calling: SMAP haplotype-sites combines known Single Nucleotide Polymorphisms (SNPs) and/or read mapping position polymorphisms (SMAPs) to reconstruct compressed, read-reference-encoded haplotype strings. In contrast, SMAP haplotype-window works independent of prior knowledge of polymorphisms, groups reads by locus, defines a window enclosed between two custom border sequences, and retains the entire corresponding DNA sequence as haplotype. Haplotype-window is, among many applications, especially useful for high-throughput CRISPR/Cas mutation screens. Either way, SMAP creates a single integrated haplotype call table across all loci and samples. SMAP haplotyping is extremely versatile and can be applied to highly multiplex amplicon sequencing (HiPlex), Shotgun (e.g. whole genome shotgun (WGS) sequencing, probe capture and RNA-Seq), or Genotyping-by-Sequencing (GBS) data; and to Illumina short reads, PacBio and MinION long reads. SMAP creates discrete genotype calls for individuals of any ploidy or quantitative haplotype frequency spectra for Pool-Seq data, and can scale from tens to thousands of loci and/or samples. SMAP, including the source code written in Python is available at https://gitlab.com/truttink/smap, and a detailed user manual and guidelines for accurate read processing is available at https://ngs-smap.readthedocs.io/, under the GNU Affero General Public License v3.0.
Background The availability of chromosome-scale genome assemblies is fundamentally important to advance genetics and breeding in crops, as well as for evolutionary and comparative genomics. The improvement of long-read sequencing technologies and the advent of optical mapping and chromosome conformation capture technologies in the last few years, significantly promoted the development of chromosome-scale genome assemblies of model plants and crop species. In grasses, chromosome-scale genome assemblies recently became available for cultivated and wild species of the Triticeae subfamily. Development of state-of-the-art genomic resources in species of the Poeae subfamily, which includes important crops like fescues and ryegrasses, is lagging behind the progress in the cereal species. Results Here, we report a new chromosome-scale genome sequence assembly for perennial ryegrass, obtained by combining PacBio long-read sequencing, Illumina short-read polishing, BioNano optical mapping and Hi-C scaffolding. More than 90% of the total genome size of perennial ryegrass (approximately 2.55 Gb) is covered by seven pseudo-chromosomes that show high levels of collinearity to the orthologous chromosomes of Triticeae species. The transposon fraction of perennial ryegrass was found to be relatively low, approximately 35% of the total genome content, which is less than half of the genome repeat content of cultivated cereal species. We predicted 54,629 high-confidence gene models, 10,287 long non-coding RNAs and a total of 8,393 short non-coding RNAs in the perennial ryegrass genome. Conclusions The new reference genome sequence and annotation presented here are valuable resources for comparative genomic studies in grasses, as well as for breeding applications and will expedite the development of productive varieties in perennial ryegrass and related species.
Germplasm from perennial ryegrass (Lolium perenne L.) natural populations is useful for breeding because of its adaptation to a wide range of climates. Climate‐adaptive genes can be detected from associations between genotype, phenotype and climate but an integrated framework for the analysis of these three sources of information is lacking. We used two approaches to identify adaptive loci in perennial ryegrass and their effect on phenotypic traits. First, we combined Genome‐Environment Association (GEA) and GWAS analyses. Then, we implemented a new test based on a Canonical Correlation Analysis (CANCOR) to detect adaptive loci. Furthermore, we improved the previous perennial ryegrass gene set by de novo gene prediction and functional annotation of 39,967 genes. GEA‐GWAS revealed eight outlier loci associated with both environmental variables and phenotypic traits. CANCOR retrieved 633 outlier loci associated with two climatic gradients, characterized by cold‐dry winter versus mild‐wet winter and long rainy season versus long summer, and pointed out traits putatively conferring adaptation at the extremes of these gradients. Our CANCOR test also revealed the presence of both polygenic and oligogenic climatic adaptations. Our gene annotation revealed that 374 of the CANCOR outlier loci were positioned within or close to a gene. Co‐association networks of outlier loci revealed a potential utility of CANCOR for investigating the interaction of genes involved in polygenic adaptations. The CANCOR test provides an integrated framework to analyse adaptive genomic diversity and phenotypic responses to environmental selection pressures that could be used to facilitate the adaptation of plant species to climate change.
Revealing DNA sequence variation within the Lolium perenne genepool is important for genetic analysis and development of breeding applications. We reviewed current literature on plant development to select candidate genes in pathways that control agronomic traits, and identified 503 orthologues in L. perenne. Using targeted resequencing, we constructed a comprehensive catalogue of genomic variation for a L. perenne germplasm collection of 736 genotypes derived from current cultivars, breeding material and wild accessions. To overcome challenges of variant calling in heterogeneous outbreeding species, we used two complementary strategies to explore sequence diversity. First, four variant calling pipelines were integrated with the VariantMetaCaller to reach maximal sensitivity. Additional multiplex amplicon sequencing was used to empirically estimate an appropriate precision threshold. Second, a de novo assembly strategy was used to reconstruct divergent alleles for each gene. The advantage of this approach was illustrated by discovery of 28 novel alleles of LpSDUF247, a polymorphic gene co-segregating with the S-locus of the grass self-incompatibility system. Our approach is applicable to other genetically diverse outbreeding species. The resulting collection of functionally annotated variants can be mined for variants causing phenotypic variation, either through genetic association studies, or by selecting carriers of rare defective alleles for physiological analyses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.