2023
DOI: 10.1101/2023.10.24.563624
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Foundational Large Language Model for Edible Plant Genomes

Javier Mendoza-Revilla,
Evan Trop,
Liam Gonzalez
et al.

Abstract: In recent years, significant progress has been made in the field of plant genomics, demonstrated by the increased use of high-throughput methodologies that allow for the characterization of multiple genome-wide molecular phenotypes. These results have provided valuable insights into plant traits and their underlying genetic mechanisms, especially in well-researched model plant species. Nonetheless, although acquiring and characterizing these molecular phenotypes can offer valuable insights into plant traits, e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 96 publications
0
2
0
Order By: Relevance
“…Based on theory and prior interpretation work on expression models (Mendoza-Revilla et al, 2023), we hypothesized our expression models would also pay most attention to the region surrounding the transcription start site. Looking at the average saliency map for DanQ across all B73 genes on the maximum expression task we see that DanQ indeed focuses on the core promoter region and the 5′ UTR (Figure 4, right).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Based on theory and prior interpretation work on expression models (Mendoza-Revilla et al, 2023), we hypothesized our expression models would also pay most attention to the region surrounding the transcription start site. Looking at the average saliency map for DanQ across all B73 genes on the maximum expression task we see that DanQ indeed focuses on the core promoter region and the 5′ UTR (Figure 4, right).…”
Section: Resultsmentioning
confidence: 99%
“…FNetCompression's performance is particularly remarkable because it has several orders of magnitude fewer parameters than DanQ (57k versus 1.6m, respectively). Large foundation models such as AgroNT (Mendoza-Revilla et al, 2023) show promising results within the training species, but FNetCompression suggests smaller, more efficient models, perhaps also utilizing a fast Fourier transform, are worth further exploration. Since the Pearson correlations we observed are still far from perfect, it is worthwhile to note that we do not expect cis sequence-based models to ever reach perfect correlation, as cis factors explain only a third of the genetic variation in expression in maize (Giri et al, 2021).…”
Section: Discussionmentioning
confidence: 99%
“…Therefore, we generated three extremely imbalanced cross-species testing datasets for rice, sorghum, and maize using BUSCOsupported genes as true sites (Figure 2B and Supplemental Table 3). We benchmarked the performance of PlantCaduceus against three DNA LMs: GPN 16 , AgroNT 18 , and Nucleotide…”
Section: Improving the Accuracy And Cross-species Transferability Of ...mentioning
confidence: 99%
“…To comprehensively evaluate our foundation model's performance, four foundation models including GPN 16 , custom GPN, AgroNT 18 and NT-v2 19 were used as baselines for various tasks.…”
Section: Gpn Custom Gpn Agront and Nt-v2 Baselinesmentioning
confidence: 99%
See 1 more Smart Citation