In 16S rRNA gene sequencing, certain bacterial genera were found to be underrepresented or even missing in taxonomic profiles when using unsuitable primer combinations, outdated reference databases, or inadequate pipeline settings. Concerning the last, quality thresholds as well as bioinformatic settings (i.e., clustering approach, analysis pipeline, and specific adjustments such as truncation) are responsible for a number of observed differences between studies.
Current notion presumes that only one protein is encoded at a given bacterial genetic locus. However, transcription and translation of an overlapping open reading frame (ORF) of 186 bp length were discovered by RNAseq and RIBOseq experiments. This ORF is almost completely embedded in the annotated L,D-transpeptidase gene ECs2385 of Escherichia coli O157:H7 Sakai in the antisense reading frame -3. The ORF is transcribed as part of a bicistronic mRNA, which includes the annotated upstream gene ECs2384, encoding a murein lipoprotein. The transcriptional start site of the operon resides 38 bp upstream of the ECs2384 start codon and is driven by a predicted σ70 promoter, which is constitutively active under different growth conditions. The bicistronic operon contains a ρ-independent terminator just upstream of the novel gene, significantly decreasing its transcription. The novel gene can be stably expressed as an EGFP-fusion protein and a translationally arrested mutant of ano, unable to produce the protein, shows a growth advantage in competitive growth experiments compared to the wild type under anaerobiosis. Therefore, the novel antisense overlapping gene is named ano (anaerobiosis responsive overlapping gene). A phylostratigraphic analysis indicates that ano originated very recently de novo by overprinting after the Escherichia/Shigella clade separated from other enterobacteria. Therefore, ano is one of the very rare cases of overlapping genes known in the genus Escherichia.
BackgroundDue to the DNA triplet code, it is possible that the sequences of two or more protein-coding genes overlap to a large degree. However, such non-trivial overlaps are usually excluded by genome annotation pipelines and, thus, only a few overlapping gene pairs have been described in bacteria. In contrast, transcriptome and translatome sequencing reveals many signals originated from the antisense strand of annotated genes, of which we analyzed an example gene pair in more detail.ResultsA small open reading frame of Escherichia coli O157:H7 strain Sakai (EHEC), designated laoB (L-arginine responsive overlapping gene), is embedded in reading frame −2 in the antisense strand of ECs5115, encoding a CadC-like transcriptional regulator. This overlapping gene shows evidence of transcription and translation in Luria-Bertani (LB) and brain-heart infusion (BHI) medium based on RNA sequencing (RNAseq) and ribosomal-footprint sequencing (RIBOseq). The transcriptional start site is 289 base pairs (bp) upstream of the start codon and transcription termination is 155 bp downstream of the stop codon. Overexpression of LaoB fused to an enhanced green fluorescent protein (EGFP) reporter was possible. The sequence upstream of the transcriptional start site displayed strong promoter activity under different conditions, whereas promoter activity was significantly decreased in the presence of L-arginine. A strand-specific translationally arrested mutant of laoB provided a significant growth advantage in competitive growth experiments in the presence of L-arginine compared to the wild type, which returned to wild type level after complementation of laoB in trans. A phylostratigraphic analysis indicated that the novel gene is restricted to the Escherichia/Shigella clade and might have originated recently by overprinting leading to the expression of part of the antisense strand of ECs5115.ConclusionsHere, we present evidence of a novel small protein-coding gene laoB encoded in the antisense frame −2 of the annotated gene ECs5115. Clearly, laoB is evolutionarily young and it originated in the Escherichia/Shigella clade by overprinting, a process which may cause the de novo evolution of bacterial genes like laoB.Electronic supplementary materialThe online version of this article (10.1186/s12862-018-1134-0) contains supplementary material, which is available to authorized users.
Background
One limiting factor of short amplicon 16S rRNA gene sequencing approaches is the use of low DNA amounts in the amplicon generation step. Especially for low-biomass samples, insufficient or even commonly undetectable DNA amounts can limit or prohibit further analysis in standard protocols.
Results
Using a newly established protocol, very low DNA input amounts were found sufficient for reliable detection of bacteria using 16S rRNA gene sequencing compared to standard protocols. The improved protocol includes an optimized amplification strategy by using a digital droplet PCR. We demonstrate how PCR products are generated even when using very low concentrated DNA, unable to be detected by using a Qubit. Importantly, the use of different 16S rRNA gene primers had a greater effect on the resulting taxonomical profiles compared to using high or very low initial DNA amounts.
Conclusion
Our improved protocol takes advantage of ddPCR and allows faithful amplification of very low amounts of template. With this, samples of low bacterial biomass become comparable to those with high amounts of bacteria, since the first and most biasing steps are the same. Besides, it is imperative to state DNA concentrations and volumes used and to include negative controls indicating possible shifts in taxonomical profiles. Despite this, results produced by using different primer pairs cannot be easily compared.
Full-length SSU rRNA gene sequencing allows species-level identification of the microorganisms present in milk samples. Here, we used bulk-tank raw milk samples of two German dairies and detected, using this method, a great diversity of bacteria, archaea, and yeasts within the samples. Moreover, the species-level classification was improved in comparison to short amplicon sequencing. Therefore, we anticipate that this approach might be useful for the detection of possible mastitis-causing species, as well as for the control of spoilage-associated microorganisms. In a proof of concept, we showed that we were able to identify several putative mastitis-causing or mastitis-associated species such as Streptococcusuberis, Streptococcusagalactiae, Streptococcusdysgalactiae, Escherichiacoli and Staphylococcusaureus, as well as several Candida species. Overall, the presented full-length approach for the sequencing of SSU rRNA is easy to conduct, able to be standardized, and allows the screening of microorganisms in labs with Illumina sequencing machines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.