2009
DOI: 10.1186/gb-2009-10-8-r85
|View full text |Cite
|
Sign up to set email alerts
|

Community-wide analysis of microbial genome sequence signatures

Abstract: Genome signatures in metagenomic datasets

Genome signatures are used to identify and cluster sequences de novo from an acid biofilm microbial community metagenomic dataset, revealing information about the low-abundance community members.

Abstract Background: Analyses of DNA sequences from cultivated microorganisms have revealed genome-wide, taxa-specific nucleotide compositional characteristics, referred to as genome signatures. These signatures have far-reaching implications for understanding genome ev…
Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

8
539
0
3

Year Published

2010
2010
2018
2018

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 507 publications
(550 citation statements)
references
References 94 publications
8
539
0
3
Order By: Relevance
“…For some genomes that lacked high strain resolution, such as Methanohalophilus-1, we confirmed the manual binning using an emergent selforganizing map (ESOM) using both the metagenomic data and isolate genomes from Marinobacter, Methanohalophilus, Methanolobus, Halanaerobium and Halomonas isolated species, as described previously 36,43 . Tetranucleotide frequencies were calculated for ≥5 kb fragments, with the number of tetranucleotides in each fragment normalized on the basis of the total number of observations in all fragments, with these values robust Z-transformed.…”
Section: Methodssupporting
confidence: 66%
“…For some genomes that lacked high strain resolution, such as Methanohalophilus-1, we confirmed the manual binning using an emergent selforganizing map (ESOM) using both the metagenomic data and isolate genomes from Marinobacter, Methanohalophilus, Methanolobus, Halanaerobium and Halomonas isolated species, as described previously 36,43 . Tetranucleotide frequencies were calculated for ≥5 kb fragments, with the number of tetranucleotides in each fragment normalized on the basis of the total number of observations in all fragments, with these values robust Z-transformed.…”
Section: Methodssupporting
confidence: 66%
“…For each contig, we determined the GC content, coverage and the phylogenetic affiliation based on the best hit for each predicted protein in the Uniref90 database 61 (Sept. 2013) following ublast searches. We also constructed emergent self-organizing maps (ESOM) 62 using tetranucleotide frequencies of 5 kb DNA fragments. A combination of these approaches was used to identify the genome.…”
Section: Methodsmentioning
confidence: 99%
“…Open reading frames (ORFs) were predicted on assembled scaffolds using Prodigal 42 . Scaffolds from the Crystal Geyser dataset were binned on the basis of differential coverage abundance patterns using a combination of ABAWACA 6 , ABAWACA2 (https://github.com/CK7), Maxbin2 43 , and tetranucleotide frequency using Emergent Self-Organizing Maps (ESOM) 44 . Genomes were manually curated using % GC content, taxonomic affiliation, and genome completeness.…”
Section: Metagenomics and Metatranscriptomicsmentioning
confidence: 99%