2014
DOI: 10.7717/peerj.332
|View full text |Cite
|
Sign up to set email alerts
|

The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes

Abstract: Background. As whole genome sequence data from bacterial isolates becomes cheaper to generate, computational methods are needed to correlate sequence data with biological observations. Here we present the large-scale BLAST score ratio (LS-BSR) pipeline, which rapidly compares the genetic content of hundreds to thousands of bacterial genomes, and returns a matrix that describes the relatedness of all coding sequences (CDSs) in all genomes surveyed. This matrix can be easily parsed in order to identify genetic r… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
237
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 201 publications
(237 citation statements)
references
References 27 publications
0
237
0
Order By: Relevance
“…The 20 EIEC genomes and 37 E. coli and Shigella reference genomes were compared by large-scale BLAST score ratio (LS-BSR) analysis as previously described (29,34,35). Briefly, the predicted protein-encoding genes of each genome that had Ն90% nucleotide sequence identity to each other were assigned to gene clusters with uclust (36).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The 20 EIEC genomes and 37 E. coli and Shigella reference genomes were compared by large-scale BLAST score ratio (LS-BSR) analysis as previously described (29,34,35). Briefly, the predicted protein-encoding genes of each genome that had Ն90% nucleotide sequence identity to each other were assigned to gene clusters with uclust (36).…”
Section: Methodsmentioning
confidence: 99%
“…The 57 E. coli and Shigella genomes included in the phylogenomic analysis were also compared by LS-BSR analysis, which is a de novo method used to determine gene-based similarity in groups of genomes (34,47) (Table 2). There were 16,418 total gene clusters that were identified in the 57 genomes, which included 1,628 gene clusters with significant similarity (LS-BSR, Ն0.9) that were present in all of the genomes (Table 2).…”
Section: Methodsmentioning
confidence: 99%
“…This difference may be indicative of the large number of environments where E. coli can be isolated, in contrast to Shigella species, which are primarily identified as pathogens of humans. The numbers and compositions of genes were also compared using BLAST score ratio (BSR) analysis (40,45). The core genome of E. coli, based on an analysis of 69 genomes, is ϳ2,155 genes (BSR Ն 0.80 in 100% of the genomes).…”
Section: Pan-genome Comparisons Betweenmentioning
confidence: 99%
“…To identify genes differentially distributed between the E. coli and Shigella genomes, a large-scale BSR (LS-BSR) analysis was performed on 69 E. coli and 69 Shigella genomes (45). The results demonstrate that several genes, primarily associated with metabolism, are conserved in E. coli isolates and largely absent (n Ͻ 2) in Shigella isolates (Table 2); this stands in contrast to a recent study which suggested that no genes could be used to distinguish the two groups (47).…”
Section: Pan-genome Comparisons Betweenmentioning
confidence: 99%
“…The genes of pB171_90 were predicted and annotated using an in-house annotation pipeline (73). These genes were then detected in a collection of 4,798 E. coli genome assemblies available in GenBank as of November 2016, using large-scale BLAST score ratio (LS-BSR) analysis as previously described (74,75). The pB171_90 protein-coding genes were compared to each genome listed in Table S1 using TBLASTN (76) with composition-based adjustment turned off.…”
Section: Methodsmentioning
confidence: 99%