2019
DOI: 10.1016/j.gpb.2018.10.008
|View full text |Cite
|
Sign up to set email alerts
|

Gclust: A Parallel Clustering Tool for Microbial Genomic Data

Abstract: The accelerating growth of the public microbial genomic data imposes substantial burden on the research community that uses such resources. Building databases for non-redundant reference sequences from massive microbial genomic data based on clustering analysis is essential. However, existing clustering algorithms perform poorly on long genomic sequences. In this article, we present Gclust, a parallel program for clustering complete or draft genomic sequences, where clustering is accelerated with a novel paral… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
2
1

Relationship

2
8

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 24 publications
0
6
0
Order By: Relevance
“…These Illumina reads were generated from libraries prepared using the Nextera prep kit (Illumina) and sequenced on an Illumina NextSeq 500 with 150 cycle paired-end sequencing. Duplicate contigs were removed using gclust v1.0 [ 111 ] and these representative contigs were used as a reference for aligning IS-Seq reads as described above.…”
Section: Methodsmentioning
confidence: 99%
“…These Illumina reads were generated from libraries prepared using the Nextera prep kit (Illumina) and sequenced on an Illumina NextSeq 500 with 150 cycle paired-end sequencing. Duplicate contigs were removed using gclust v1.0 [ 111 ] and these representative contigs were used as a reference for aligning IS-Seq reads as described above.…”
Section: Methodsmentioning
confidence: 99%
“…These blocks were mapped to NipRG (including mitochondrion and plastid) again using minimap2 v2.17 ( Li 2018 ), and the sequences mapped with ≥90% identity and 80% coverage were removed. The remaining sequences were clustered into nonredundant sequences with identity cutoff of 90% using Gclust v1.0.0 ( Li et al 2019 ) and EUPAN v0.44 ( Hu et al 2017 ) blastCluster. After that, the remaining sequences were mapped to NT database (June 18, 2020) using BLAST+ v2.10.1 ( Camacho et al 2009 ) BLASTN.…”
Section: Methodsmentioning
confidence: 99%
“…Gclust, a parallel technique was developed [18] to group entire or partial gene data. In this model, a new multithreading mechanism and a rapid gene evaluation technique were adopted by the Sparse Suffix Arrays (SSAs) to speed up the grouping.…”
Section: Literature Surveymentioning
confidence: 99%