Difficulty in detecting rare variants is one of the problems in conventional genome-wide association studies (GWAS). The problem is closely related to the complex gene compositions comprising multiple alleles, such as haplotypes. Several single nucleotide polymorphism (SNP) set approaches have been proposed to solve this problem. These methods, however, have been rarely discussed in connection with haplotypes. In this study, we developed a novel SNP-set method named "RAINBOW" and applied the method to haplotypebased GWAS by regarding a haplotype block as a SNP-set. Combining haplotype block estimation and SNP-set GWAS, haplotype-based GWAS can be conducted without prior information of haplotypes. We prepared 100 datasets of simulated phenotypic data and real marker genotype data of Oryza sativa subsp. indica, and performed GWAS of the datasets. We compared the power of our method, the conventional single-SNP GWAS, the conventional haplotype-based GWAS, and the conventional SNP-set GWAS. Our proposed method was shown to be superior to these in three aspects: (1) controlling false positives; (2) in detecting causal variants without relying on the linkage disequilibrium if causal variants were genotyped in the dataset; and (3) it showed greater power than the other methods, i.e., it was able to detect causal variants that were not detected by the others, primarily when the causal variants were located very close to each other, and the directions of their effects were opposite. By using the SNP-set approach as in this study, we expect that detecting not only rare variants but also genes with complex mechanisms, such as genes with multiple causal variants, can be realized. RAINBOW was implemented as an R package named
Background: Difficulty in detecting rare variants is one of the problems in conventional genome wide association studies (GWAS). The problem is closely related to the complex gene compositions comprising multiple alleles, such as haplotypes. Several single nucleotide polymorphism (SNP) set approaches have been proposed to solve this problem. These methods, however, have been rarely discussed in connection with haplotypes. In this study, we developed a novel SNP-set GWAS method named "RAINBOW" and applied the method to haplotype-based GWAS by regarding a haplotype block as a SNP-set. Combining haplotype block estimation and SNP-set GWAS, haplotype-based GWAS can be conducted without prior information of haplotypes. Results:We prepared 100 datasets of simulated phenotypic data and real marker genotype data of Oryza sativa subsp. indica, and performed GWAS of the datasets. We compared the power of our method, the conventional single-SNP GWAS, the conventional haplotype-based GWAS, and the conventional SNP-set GWAS. The results of the comparison indicated that the proposed method was able to better control false positives than the others. The proposed method was also excellent at detecting causal variants without relying on the linkage disequilibrium if causal variants were genotyped in the dataset. Moreover, the proposed method showed greater power than the other methods, i.e., it was able to detect causal variants that were not detected by the others, especially when the causal variants were located very close to each other and the directions of their effects were opposite. Conclusion:The proposed method, RAINBOW, is especially superior in controlling false positives, detecting causal variants, and detecting nearby causal variants with opposite effects. By using the SNP-set approach as the proposed method, we expect that detecting not only rare variants but also genes with complex mechanisms, such as genes with multiple causal variants, can be realized. RAINBOW was implemented as the R package and is available at https://github.com/KosukeHamazaki/RAINBOW. Keywords: GWAS; SNP-set; haplotype; mixed effects model; family relatedness KH developed the method of the novel haplotype-based GWAS with the SNP-set approach, RAINBOW, conducted all statistical analyses, and drafted the manuscript. HI was involved in the conception and design of the study, provided administrative support, and supervised the study. Both authors have read and approved the final manuscript.
A genome-wide association study (GWAS) needs to have a suitable population. The factors that affect a GWAS (e.g. population structure, sample size, and sequence analysis and field testing costs) need to be considered. Mixed populations containing subpopulations of different genetic backgrounds may be suitable populations. We conducted simulation experiments to see if a population with high genetic diversity, such as a diversity panel, should be added to a target population, especially when the target population harbors small genetic diversity. The target population was 112 accessions of Oryza sativa L. subsp. japonica, mainly developed in Japan. We combined the target population with three populations that had higher genetic diversity. These were 100 indica accessions, 100 japonica accessions, and 100 accessions with various genetic backgrounds. The results showed that the GWAS's power with a mixed population was generally higher than with a separate population. Also, the optimal GWAS populations varied depending on the fixation index (F ST ) of the quantitative trait nucleotides (QTNs) and the polymorphism of QTNs in each population. When a QTN was polymorphic in a target population, a target population combined with a higher diversity population improved the QTN's detection power. By investigating F ST and the expected heterozygosity (H e ) as factors influencing the detection power, weAbbreviations: AUC, area under the curve; CDR, correct detection rate; FN, false negative; FP, false positive; F ST , fixation index; GWAS, genome-wide association study; H e , expected heterozygosity; IND, indica population; LD, linkage disequilibrium; MAF, minor allele frequency; QTN, quantitative trait nucleotide; SNP, single nucleotide polymorphism; TJN, temperate japonica with a narrow genetic background; TN, true negative; TP, true positive.This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
To enrich carotenoids, especially β-cryptoxanthin, in juice sac tissues of fruits via molecular breeding in citrus, allele mining was utilized to dissect allelic variation of carotenoid metabolic genes and identify an optimum allele on the target loci characterized by expression quantitative trait (eQTL) analysis. SNPs of target carotenoid metabolic genes in 13 founders of the Japanese citrus breeding population were explored using the SureSelect target enrichment method. An independent allele was determined based on the presence or absence of reliable SNPs, using trio analysis to confirm inheritability between parent and offspring. Among the 13 founders, there were 7 PSY alleles, 7 HYb alleles, 11 ZEP alleles, 5 NCED alleles, and 4 alleles for the eQTL that control the transcription levels of PDS and ZDS among the ancestral species, indicating that some founders acquired those alleles from them. The carotenoid composition data of 263 breeding pedigrees in juice sac tissues revealed that the phenotypic variance of carotenoid composition was similar to that in the 13 founders, whereas the mean of total carotenoid content increased. This increase in total carotenoid content correlated with the increase in either or both β-cryptoxanthin and violaxanthin in juice sac tissues. Bayesian statistical analysis between allelic composition of target genes and carotenoid composition in 263 breeding pedigrees indicated that PSY-a and ZEP-e alleles at PSY and ZEP loci had strong positive effects on increasing the total carotenoid content, including β-cryptoxanthin and violaxanthin, in juice sac tissues. Moreover, the pyramiding of these alleles also increased the β-cryptoxanthin content. Interestingly, the offset interaction between the alleles with increasing and decreasing effects on carotenoid content and the epistatic interaction among carotenoid metabolic genes were observed and these interactions complexed carotenoid profiles in breeding population. These results revealed that allele composition would highly influence the carotenoid composition in citrus fruits. The allelic genotype information for the examined carotenoid metabolic genes in major citrus varieties and the trio-tagged SNPs to discriminate the optimum alleles (PSY-a and ZEP-e) from the rest would promise citrus breeders carotenoid enrichment in fruit via molecular breeding.
Plant response to drought is an important yield-related trait under abiotic stress, but the method for measuring and modeling plant responses in a time series has not been fully established. The objective of this study was to develop a method to measure and model plant response to irrigation changes using time-series multispectral (MS) data. We evaluated 178 soybean (Glycine max (L.) Merr.) accessions under three irrigation treatments at the Arid Land Research Center, Tottori University, Japan in 2019, 2020 and 2021. The irrigation treatments included W5: watering for 5 d followed by no watering 5 d, W10: watering for 10 d followed by no watering 10 d, D10: no watering for 10 d followed by watering 10 d, and D: no watering. To capture the plant responses to irrigation changes, time-series MS data were collected by unmanned aerial vehicle during the irrigation/non-irrigation switch of each irrigation treatment. We built a random regression model (RRM) for each of combination of treatment by year using the time-series MS data. To test the accuracy of the information captured by RRM, we evaluated the coefficient of variation (CV) of fresh shoot weight of all accessions under a total of nine different drought conditions as an indicator of plant’s stability under drought stresses. We built a genomic prediction model (MTRRM model) using the genetic random regression coefficients of RRM as secondary traits and evaluated the accuracy of each model for predicting CV. In 2020 and 2021,the mean prediction accuracies of MTRRM models built in the changing irrigation treatments (r = 0.44 and 0.49, respectively) were higher than that in the continuous drought treatment (r = 0.34 and 0.44, respectively) in the same year. When the CV was predicted using the MTRRM model across 2020 and 2021 in the changing irrigation treatment, the mean prediction accuracy (r = 0.46) was 42% higher than that of the simple genomic prediction model (r =0.32). The results suggest that this RRM method using the time-series MS data can effectively capture the genetic variation of plant response to drought.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.