2023
DOI: 10.1101/2023.07.11.548588
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Uni-Rna: Universal Pre-Trained Models Revolutionize Rna Research

Abstract: RNA molecules play a crucial role as intermediaries in diverse biological processes. Attaining a profound understanding of their function can substantially enhance our comprehension of life’s activities and facilitate drug development for numerous diseases. The advent of high-throughput sequencing technologies makes vast amounts of RNA sequence data accessible, which contains invaluable information and knowledge. However, deriving insights for further application from such an immense volume of data poses a sig… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 76 publications
0
3
0
Order By: Relevance
“…Compared to UFold 22 that is considered as the current state-of-the-art end-to-end method for RNA secondary structure prediction, ERNIE-RNA improves the average binary f1 score by 14.4% to 0.748. Despite ERNIE-RNA having less than one-fifth of the model size and pretrained with less than one-fiftieth of the data size compared to UNI-RNA 32 , ERNIE-RNA outperforms UNI-RNA, the currently largest pre-trained RNA language model, further boosting the average macro-average f1 score by 6.3% to 0.873. We further compared ERNIR-RNA’s prediction performance to RNA-FM at the single sample level.…”
Section: Resultsmentioning
confidence: 98%
See 1 more Smart Citation
“…Compared to UFold 22 that is considered as the current state-of-the-art end-to-end method for RNA secondary structure prediction, ERNIE-RNA improves the average binary f1 score by 14.4% to 0.748. Despite ERNIE-RNA having less than one-fifth of the model size and pretrained with less than one-fiftieth of the data size compared to UNI-RNA 32 , ERNIE-RNA outperforms UNI-RNA, the currently largest pre-trained RNA language model, further boosting the average macro-average f1 score by 6.3% to 0.873. We further compared ERNIR-RNA’s prediction performance to RNA-FM at the single sample level.…”
Section: Resultsmentioning
confidence: 98%
“…Nowadays, the advancements in high throughput sequencing technology 26 have produced a wealth of unlabeled data, which contain rich information about RNA structures and functions. Many BERT-style 27 RNA language models trained on abundant RNA sequences have been reported, such as RNA-FM 28 , RNABERT 29 , RNA-MSM 30 , CodonBERT 31 and UNI-RNA 32 . RNA-FM is a BERT-based RNA foundation model trained on 23 million unannotated RNA sequences, demonstrating applications in predicting both structural and functional properties.…”
Section: Introductionmentioning
confidence: 99%
“…RNABERT was trained on 76,237 human-derived small ncRNAs. Other methods have been adapted to RNA language but uses MSA as inputs like RNA-FM (29), Uni-RNA (30) or RNA-MSM (31). Nonetheless, they require multiple sequence alignment (MSA) as inputs, which restricts the use for RNAs.…”
Section: Methodsmentioning
confidence: 99%
“…For the proteins, EMSFold (16) has built a successful language model (without the use of MSA) for protein 3D structure prediction, which achieves competitive results. For RNA, methods have been developed to leverage language-based approaches for RNA structural predictions, like RNA-FM (17), Uni-RNA (18) or RNA-MSM (19). Nonetheless, they require multiple sequence alignment (MSA) as inputs, which restricts the use for RNAs.…”
Section: Introductionmentioning
confidence: 99%
“…Additionally, and in contrast to MSAs, chemical mapping experiments can be run on arbitrary RNA sequences, allowing the exploration of sequence space beyond natural sequences. As these experiments directly measure structural information, foundation models trained on chemical mapping data could enable better predictions on structure-related tasks for RNAs of interest compared to models trained on natural sequences alone [22][23][24].…”
Section: Introductionmentioning
confidence: 99%