2019
DOI: 10.1177/1176934318821080
|View full text |Cite
|
Sign up to set email alerts
|

SAliBASE: A Database of Simulated Protein Alignments

Abstract: Simulated alignments are alternatives to manually constructed multiple sequence alignments for evaluating performance of multiple sequence alignment tools. The importance of simulated sequences is recognized because their true evolutionary history is known, which is very helpful for reconstructing accurate phylogenetic trees and alignments. However, generating simulated alignments require expertise to use bioinformatics tools and consume several hours for reconstructing even a few hundreds of simulated sequenc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
5
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 11 publications
0
5
0
Order By: Relevance
“…A considerable number of statistical models and methods have been developed and tested for inferring the evolutionary relationship of nucleotide and protein sequences. [16][17][18] The established models are effective in simulating evolution of real sequences, 19,20 in establishing databases of simulated protein alignments, 21 and in exploring early events in the ecological differentiation of bacterial genomes. 22 However, these methods were mostly developed for simulation of gene or protein sequences which possess high conservatism.…”
Section: Introductionmentioning
confidence: 99%
“…A considerable number of statistical models and methods have been developed and tested for inferring the evolutionary relationship of nucleotide and protein sequences. [16][17][18] The established models are effective in simulating evolution of real sequences, 19,20 in establishing databases of simulated protein alignments, 21 and in exploring early events in the ecological differentiation of bacterial genomes. 22 However, these methods were mostly developed for simulation of gene or protein sequences which possess high conservatism.…”
Section: Introductionmentioning
confidence: 99%
“…Many databases have proliferated in recent days. The most common databases of protein which include large amount of protein sequences are Swiss-Prot, HOMFAM [20], SAliBASE [21,22], PIR, Pfam [23], and BAliBASE [24,25]. There is another database for DNA and RNA sequences such as GenBank, HOMSTRAD, PDB, and RefSeq [2].…”
Section: Widespread Datasets Of Proteinmentioning
confidence: 99%
“…Each dataset part in SAliBASE contains the corresponding alignment that is considered as a standard to evaluate MSA approaches. There are five parameters that control the database generation such as the sequence number, the rate of insertion, the rate of deletion, the length of the sequence, and indel size as in [21].…”
Section: Widespread Datasets Of Proteinmentioning
confidence: 99%
“…Simulated sequence evolution datasets have also been used to evaluate MSA tools ( 36–39 ), providing the means to produce larger test sets, a wide range of sequence divergence characteristics, and supporting the generation of DNA-specific benchmarks. However, these previous studies focused on constrained sequences ( 36 ), protein simulations ( 37 , 38 ), or ignored the problem of working with fragmented sequences ( 36–39 ).…”
Section: Introductionmentioning
confidence: 99%