Highly parallel SNP genotyping platforms have been developed for some important crop species, but these platforms typically carry a high cost per sample for first-time or small-scale users. In contrast, recently developed genotyping by sequencing (GBS) approaches offer a highly cost effective alternative for simultaneous SNP discovery and genotyping. In the present investigation, we have explored the use of GBS in soybean. In addition to developing a novel analysis pipeline to call SNPs and indels from the resulting sequence reads, we have devised a modified library preparation protocol to alter the degree of complexity reduction. We used a set of eight diverse soybean genotypes to conduct a pilot scale test of the protocol and pipeline. Using ApeKI for GBS library preparation and sequencing on an Illumina GAIIx machine, we obtained 5.5 M reads and these were processed using our pipeline. A total of 10,120 high quality SNPs were obtained and the distribution of these SNPs mirrored closely the distribution of gene-rich regions in the soybean genome. A total of 39.5% of the SNPs were present in genic regions and 52.5% of these were located in the coding sequence. Validation of over 400 genotypes at a set of randomly selected SNPs using Sanger sequencing showed a 98% success rate. We then explored the use of selective primers to achieve a greater complexity reduction during GBS library preparation. The number of SNP calls could be increased by almost 40% and their depth of coverage was more than doubled, thus opening the door to an increase in the throughput and a significant decrease in the per sample cost. The approach to obtain high quality SNPs developed here will be helpful for marker assisted genomics as well as assessment of available genetic resources for effective utilisation in a wide number of species.
SummaryNext‐generation sequencing (NGS) and bioinformatics tools have greatly facilitated the characterization of nucleotide variation; nonetheless, an exhaustive description of both SNP haplotype diversity and of structural variation remains elusive in most species. In this study, we sequenced a representative set of 102 short‐season soya beans and achieved an extensive coverage of both nucleotide diversity and structural variation (SV). We called close to 5M sequence variants (SNPs, MNPs and indels) and noticed that the number of unique haplotypes had plateaued within this set of germplasm (1.7M tag SNPs). This data set proved highly accurate (98.6%) based on a comparison of called genotypes at loci shared with a SNP array. We used this catalogue of SNPs as a reference panel to impute missing genotypes at untyped loci in data sets derived from lower density genotyping tools (150 K GBS‐derived SNPs/530 samples). After imputation, 96.4% of the missing genotypes imputed in this fashion proved to be accurate. Using a combination of three bioinformatics pipelines, we uncovered ~92 K SVs (deletions, insertions, inversions, duplications, CNVs and translocations) and estimated that over 90% of these were accurate. Finally, we noticed that the duplication of certain genomic regions explained much of the residual heterozygosity at SNP loci in otherwise highly inbred soya bean accessions. This is the first time that a comprehensive description of both SNP haplotype diversity and SV has been achieved within a regionally relevant subset of a major crop.
In eastern Canada, earliness is an important trait for soybean given the short growing season. The aim of this work was to develop tools for breeders to rapidly identify alleles present in their germplasm at the recently cloned maturity locus E3 (GmPhyA3). The tremendous throughput of modern DNA sequencing technology has allowed the use of genotyping by sequencing (GBS) approaches to identify and genotype thousands of single nucleotide polymorphisms (SNPs) across the entire genome. We have used a GBS protocol and SNPcalling pipeline optimized for soybean to characterize 53 nearisogenic lines (NILs) contrasting for maturity loci. Results obtained clearly showed the suitability of GBS to provide a dense SNP coverage and very accurate information on the location and size of introgressed regions. We then developed a GBS haplotype method to characterize 91 plant introductions (PIs) as well as a set of 305 lines representative of the Eastern Canadian germplasm for their allelic status at the GmPhyA3 gene. Six distinct haplotypes in and around the E3 locus were observed. Subsequent tests on two genotypes per haplotype (PCR test for a previously reported allele, sequencing entire gene), and validation on a subset of lines, allowed to determine that each of these corresponded to a different allele of this gene. We found that the functional allele E3Ha and the loss of function allele e3-tr were the two most prevalent in the Eastern Canadian germplasm, while the e3-fs allele was found at low frequency and e3-ns was absent. These results show that this approach is a powerful method for rapid allelic characterization, and its application to other maturity genes will be useful for breeding purposes.
core ideas • A gene-centric approach for haplotype definition was developed and implemented in R. • The tool allows for allelic characterization at given loci in germplasm collections. • Allelic status at four maturity genes is predicted on the basis of marker genotyping data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.