2023
DOI: 10.1101/2023.02.01.526559
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Master Database of All Possible RNA Sequences and Its Integration with RNAcmap for RNA Homology Search

Abstract: Recent success of AlphaFold2 in protein structure prediction relied heavily on co-evolutionary information derived from homologous protein sequences found in the huge, integrated database of protein sequences (Big Fantastic Database). In contrast, the existing nucleotide databases were not consolidated to facilitate wider and deeper homology search. Here, we built a comprehensive database by including the noncoding RNA sequences from RNAcentral, the transcriptome assembly and metagenome assembly from MG-RAST, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 41 publications
(67 reference statements)
0
5
0
Order By: Relevance
“…Deeper MSAs for RNAs can be constructed using the new RNA database published by Chen at al. [77]. To improve the input ss we recommend to use a consensus ss predicted by multiple methods, especially the latest ML-based methods [78].…”
Section: Discussionmentioning
confidence: 99%
“…Deeper MSAs for RNAs can be constructed using the new RNA database published by Chen at al. [77]. To improve the input ss we recommend to use a consensus ss predicted by multiple methods, especially the latest ML-based methods [78].…”
Section: Discussionmentioning
confidence: 99%
“…We downloaded 4069 RNA families (version 14.7) from https://rfam.xfam.org on 09/04/2022. The fully automatic RNAcmap3 for homolog search and sequence alignment ( 33 ) was employed for these 4069 RNA families by using their covariance models (CMs) for each family. Although the language model is unsupervised learning, we excluded the Rfam families which contain RNA sequences with experimentally determined structures in order to minimize potential overfitting for structural inference.…”
Section: Methodsmentioning
confidence: 99%
“…This problem was solved with the development of RNAcmap ( 30 ), which integrates BLAST-N, Infernal and a secondary structure predictor such as RNAfold ( 31 ) for a fully automatic homology search. RNAcmap was further improved with additional iteration ( 32 ) and a large expansion of the sequence database ( 33 ).…”
Section: Introductionmentioning
confidence: 99%
“…We downloaded 4069 RNA families (version 14.7) from https://rfam.xfam.org on 09/04/2022. The fully automatic RNAcmap3 for homolog search and sequence alignment 34 was employed for these 4069 RNA families by using their covariance models (CM) for each family. Although the language model is unsupervised learning, we excluded the Rfam families which contains RNA sequences with experimentally determined structures to minimize potential over-fitting for structural inference.…”
Section: Msa Generationmentioning
confidence: 99%
“…This problem was solved with the development of RNAcmap 31 , which integrates BLAST-N, Infernal, and a secondary structure predictor such as RNAfold 32 for a fully automatic homology search. RNAcmap was further improved with additional iteration 33 and a large expansion of sequence database 34 . This work reports an RNA MSA-transformer language model (RNA-MSM) based on homologous sequences generated from RNAcmap3.…”
Section: Introductionmentioning
confidence: 99%