Abstract:BackgroundElucidation of genotype-to-phenotype relationships is a major challenge in biology. In plants, it is the basis for molecular breeding. Quantitative Trait Locus (QTL) mapping enables to link variation at the trait level to variation at the genomic level. However, QTL regions typically contain tens to hundreds of genes. In order to prioritize such candidate genes, we show that we can identify potentially causal genes for a trait based on overrepresentation of biological processes (gene functions) for t… Show more
“…The results are shown in Supplementary Table S2. To avoid overly broad functional categories, we marked those GO terms higher than 1% within the genome in gray (Bargsten et al, 2014) and excluded them from further analyses (Supplementary Table S2). We found that the most significantly enriched GO biological process (FDR = 8.29E-09) in cluster 5 was fatty acid biosynthetic process ( Figure 2B ; Supplementary Table S2), which has a temporal expression pattern of “up-down-down” ( Figure 2A ).…”
Understanding the regulation of lipid metabolism is vital for genetic engineering of canola (Brassica napus L.) to increase oil yield or modify oil composition. We conducted time-series analyses of transcriptomes and proteomes to uncover the molecular networks associated with oil accumulation and dynamic changes in these networks in canola. The expression levels of genes and proteins were measured at 2, 4, 6, and 8 weeks after pollination (WAP). Our results show that the biosynthesis of fatty acids is a dominant cellular process from 2 to 6 WAP, while the degradation mainly happens after 6 WAP. We found that genes in almost every node of fatty acid synthesis pathway were significantly up-regulated during oil accumulation. Moreover, significant expression changes of two genes, acetyl-CoA carboxylase and acyl-ACP desaturase, were detected on both transcriptomic and proteomic levels. We confirmed the temporal expression patterns revealed by the transcriptomic analyses using quantitative real-time PCR experiments. The gene set association analysis show that the biosynthesis of fatty acids and unsaturated fatty acids are the most significant biological processes from 2-4 WAP and 4-6 WAP, respectively, which is consistent with the results of time-series analyses. These results not only provide insight into the mechanisms underlying lipid metabolism, but also reveal novel candidate genes that are worth further investigation for their values in the genetic engineering of canola.
“…The results are shown in Supplementary Table S2. To avoid overly broad functional categories, we marked those GO terms higher than 1% within the genome in gray (Bargsten et al, 2014) and excluded them from further analyses (Supplementary Table S2). We found that the most significantly enriched GO biological process (FDR = 8.29E-09) in cluster 5 was fatty acid biosynthetic process ( Figure 2B ; Supplementary Table S2), which has a temporal expression pattern of “up-down-down” ( Figure 2A ).…”
Understanding the regulation of lipid metabolism is vital for genetic engineering of canola (Brassica napus L.) to increase oil yield or modify oil composition. We conducted time-series analyses of transcriptomes and proteomes to uncover the molecular networks associated with oil accumulation and dynamic changes in these networks in canola. The expression levels of genes and proteins were measured at 2, 4, 6, and 8 weeks after pollination (WAP). Our results show that the biosynthesis of fatty acids is a dominant cellular process from 2 to 6 WAP, while the degradation mainly happens after 6 WAP. We found that genes in almost every node of fatty acid synthesis pathway were significantly up-regulated during oil accumulation. Moreover, significant expression changes of two genes, acetyl-CoA carboxylase and acyl-ACP desaturase, were detected on both transcriptomic and proteomic levels. We confirmed the temporal expression patterns revealed by the transcriptomic analyses using quantitative real-time PCR experiments. The gene set association analysis show that the biosynthesis of fatty acids and unsaturated fatty acids are the most significant biological processes from 2-4 WAP and 4-6 WAP, respectively, which is consistent with the results of time-series analyses. These results not only provide insight into the mechanisms underlying lipid metabolism, but also reveal novel candidate genes that are worth further investigation for their values in the genetic engineering of canola.
“…Several causal variant or gene prioritization methods have been developed for human data but 223 not many in plants (Bargsten et al, 2014;Jagadeesh et al, 2016;Kircher et al, 2014;Schaefer et 224 al., 2018). Most prioritization methods have been developed for GWAS mapping in human, an 225 organism where linkage mapping cannot be performed.…”
Section: A Machine-learning Algorithm To Prioritize Qtl Causal Genes 222mentioning
confidence: 99%
“…A causal gene prioritization is especially helpful for the large QTLs identified by 228 linkage mapping, which can constitute hundreds to thousands of genes. One method has been 229 developed in rice to prioritize causal genes for linkage mapping (Bargsten et al, 2014). This 230 method is based on the hypothesis that causal genes from multiple QTLs of the same trait are 231 more likely to have the same biological process GO terms, and therefore genes with 232 overrepresented biological process GOs were prioritized as causal genes.…”
Section: A Machine-learning Algorithm To Prioritize Qtl Causal Genes 222mentioning
confidence: 99%
“…For QTLs identified by linkage mapping, finding causal genes underlying them is still a big 70 bottleneck (Bergelson and Roux, 2010). In a typical rice linkage mapping, the size of a QTL can 71 range from 200kb-3Mb, which can harbor tens to hundreds of genes depending on the mapping 72 population and gene density (Bargsten et al, 2014;Daware et al, 2017). Even in the post-73 genomic era where all the genes in the genome are uncovered, identifying QTL causal genes is 74 not straightforward since many QTLs either contain no obvious candidate genes or too many 75 genes relevant for the trait (Nuzhdin et al, 1999).…”
Section: Introduction 34mentioning
confidence: 99%
“…One method was developed for GWAS in maize based on 102 co-expression networks (Schaefer et al, 2018). Another method was developed for linkage 103 mapping based on biological process GOs (Bargsten et al, 2014). To date, no machine-learning 104 approaches using multiple data types have been developed to address this problem.…”
5Linkage mapping is one of the most commonly used methods to identify genetic loci that determine a trait.
6However, the loci identified by linkage mapping may contain hundreds of candidate genes and require a 1 7 3 1 and developed a novel computational tool to prioritize causal genes. 3 2
Increasing the production of the three major food crops (MFCs), maize (Zea mays), rice (Oryza sativa), and wheat (Triticum aestivum), is essential to fulfilling the food demand for the growing human population. Increasing food production may require the integration of machine learning (ML) into plant breeding programs. However, developing ML tools to improve the production of MFCs is a daunting task due to the lack of quality data and the computation resources needed to process this information. Hence, this review discusses the recent applications of ML for improving MFCs production, including plant phenotyping, yield forecasting, and candidate gene prediction. Based on the challenges reported in recent ML experiments for MFCs, this review prescribes solutions to produce scalable ML models. This review provides valuable insights for future studies and promotes collective efforts among researchers implementing ML to enhance MFCs productivity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.