2019
DOI: 10.1101/792739
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ASM-Clust: classifying functionally diverse protein families using alignment score matrices

Abstract: 0Rapid advances in sequencing technology have resulted in the availability of genomes from 1 1

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
1

Relationship

3
0

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 18 publications
0
4
0
Order By: Relevance
“…Repeat sequences from the Heimdallarchaeum CRISPR arrays were used to blast against the CRISPR repeats we recruited, using CRISPRCasTyper, from multiple databases with a 95% alignment and 95% identity cut-off. The databases include GTDB v. 95 While no homologous CRISPR repeats were found in the entire GTDB database, we found several CRISPR arrays from the Guaymas and Pescadero assemblies with identical repeats to the Heimdallarchaeum CRISPR repeats found in this study, demonstrating the specificity of the CRISPR discovery approach. Since both the Guaymas and Pescadero CRISPR sets comprise assembled sequences that were not de-replicated, the entire CRISPR spacer collection from the recruited CRISPR arrays was de-replicated using a 100% identity cut-off.…”
Section: Phylogenomicsmentioning
confidence: 55%
See 1 more Smart Citation
“…Repeat sequences from the Heimdallarchaeum CRISPR arrays were used to blast against the CRISPR repeats we recruited, using CRISPRCasTyper, from multiple databases with a 95% alignment and 95% identity cut-off. The databases include GTDB v. 95 While no homologous CRISPR repeats were found in the entire GTDB database, we found several CRISPR arrays from the Guaymas and Pescadero assemblies with identical repeats to the Heimdallarchaeum CRISPR repeats found in this study, demonstrating the specificity of the CRISPR discovery approach. Since both the Guaymas and Pescadero CRISPR sets comprise assembled sequences that were not de-replicated, the entire CRISPR spacer collection from the recruited CRISPR arrays was de-replicated using a 100% identity cut-off.…”
Section: Phylogenomicsmentioning
confidence: 55%
“…The resulting protein sequences were combined with the Asgard archaea integrases/ transposases originally pooled and were clustered together using 95% sequence identity with cd-hit. The resulting 96,367 representative sequences were clustered using ASM-Clust 95 with a sequence subset size of 5,000 to generate the alignment score matrix, using default values for the other settings.…”
Section: Resolution Of the Genomic Insertion And Circularization Of Heimv1mentioning
confidence: 99%
“…Multiheme cytochrome c fold family proteins were obtained from the GTDB using two iterations of sequence recruitment and filtering using a bit score ratio (Rasko, Myers, and Ravel 2005) for each of the five constituent protein families. The resulting sequence sets were merged and dereplicated, and the resulting 5855 proteins sequences were classified using ASM-clust with t-distributed stochastic neighborhood embedding (tSNE) perplexity value set to 500 (Speth and Orphan 2019).…”
Section: Methodsmentioning
confidence: 99%
“…The resulting sequence sets were merged and dereplicated, and the resulting 5855 proteins sequences were classified using ASM-clust with t-distributed stochastic neighborhood embedding (tSNE) perplexity value set to 500 (Speth and Orphan 2019) .…”
Section: Mag Phylogeny and Annotationmentioning
confidence: 99%