Accurate identification of polymorphisms from sequence data is crucial to unlocking the potential of high throughput sequencing for genomics. Single nucleotide polymorphisms (SNPs) are difficult to accurately identify in polyploid crops due to the duplicative nature of polyploid genomes leading to low confidence in the true alignment of short reads. Implementing a haplotype-based method in contrasting subgenome-specific sequences leads to higher accuracy of SNP identification in polyploids. To test this method, a large-scale 48K SNP array (Axiom Arachis2) was developed for Arachis hypogaea (peanut), an allotetraploid, in which 1,674 haplotype-based SNPs were included. Results of the array show that 74% of the haplotype-based SNP markers could be validated, which is considerably higher than previous methods used for peanut. The haplotype method has been implemented in a standalone program, HAPLOSWEEP, which takes as input bam files and a vcf file and identifies haplotype-based markers. Haplotype discovery can be made within single reads or span paired reads, and can leverage long read technology by targeting any length of haplotype. Haplotype-based genotyping is applicable in all allopolyploid genomes and provides confidence in marker identification and in silico-based genotyping for polyploid genomics.
Core Ideas Finding reliable SNPs in polyploids is challenging Machine learning is an efficient tool to refine SNP calling from NGS data of polyploids SNP‐ML tool was designed to facilitate SNP calling Single nucleotide polymorphisms (SNPs) have many advantages as molecular markers since they are ubiquitous and codominant. However, the discovery of true SNPs in polyploid species is difficult. Peanut (Arachis hypogaea L.) is an allopolyploid, which has a very low rate of true SNP calling. A large set of true and false SNPs identified from the Axiom_Arachis 58k array was leveraged to train machine‐learning models to enable identification of true SNPs directly from sequence data to reduce ascertainment bias. These models achieved accuracy rates above 80% using real peanut RNA sequencing (RNA‐seq) and whole‐genome shotgun (WGS) resequencing data, which is higher than previously reported for polyploids and at least a twofold improvement for peanut. A 48K SNP array, Axiom_Arachis2, was designed using this approach resulting in 75% accuracy of calling SNPs from different tetraploid peanut genotypes. Using the method to simulate SNP variation in several polyploids, models achieved >98% accuracy in selecting true SNPs. Additionally, models built with simulated genotypes were able to select true SNPs at >80% accuracy using real peanut data. This work accomplished the objective to create an effective approach for calling highly reliable SNPs from polyploids using machine learning. A novel tool was developed for predicting true SNPs from sequence data, designated as SNP machine learning (SNP‐ML), using the described models. The SNP‐ML additionally provides functionality to train new models not included in this study for customized use, designated SNP machine learner (SNP‐MLer). The SNP‐ML is publicly available.
Postharvest aflatoxin contamination is a challenging issue that affects peanut quality. Aflatoxin is produced by fungi belonging to the Aspergilli group, and is known as an acutely toxic, carcinogenic, and immune-suppressing class of mycotoxins. Evidence for several host genetic factors that may impact aflatoxin contamination has been reported, , genes for lipoxygenase (PnLOX1 and PnLOX2/PnLOX3 that showed either positive or negative regulation with infection), reactive oxygen species, and WRKY (highly associated with or differentially expressed upon infection of maize with ); however, their roles remain unclear. Therefore, we conducted an RNA-sequencing experiment to differentiate gene response to the infection by between resistant (ICG 1471) and susceptible (Florida-07) cultivated peanut genotypes. The gene expression profiling analysis was designed to reveal differentially expressed genes in response to the infection (infected mock-treated seeds). In addition, the differential expression of the fungal genes was profiled. The study revealed the complexity of the interaction between the fungus and peanut seeds as the expression of a large number of genes was altered, including some in the process of plant defense to aflatoxin accumulation. Analysis of the experimental data with "keggseq," a novel designed tool for Kyoto Encyclopedia of Genes and Genomes enrichment analysis, showed the importance of α-linolenic acid metabolism, protein processing in the endoplasmic reticulum, spliceosome, and carbon fixation and metabolism pathways in conditioning resistance to aflatoxin accumulation. In addition, coexpression network analysis was carried out to reveal the correlation of gene expression among peanut and fungal genes. The results showed the importance of WRKY, toll/Interleukin1 receptor-nucleotide binding site leucine-rich repeat (TIR-NBS-LRR), ethylene, and heat shock proteins in the resistance mechanism.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.