Background
Alkanes are important components of fossil energy, such as crude oil. The alkane monooxygenase encoded by alkB gene performs the initial step of alkane degradation under aerobic conditions. The alkB gene is well studied due to its ubiquity as well as the availability of experimentally functional evidence. The alkBFGHJKL and alkST clusters are special kind of alkB-type alkane hydroxylase system, which encode all proteins necessary for converting alkanes into corresponding fatty acids.
Methods
To explore whether the alkBFGHJKL and alkST clusters were widely distributed, we performed a large-scale analysis of isolate and metagenome assembled genome data (>390,000 genomes) to identify these clusters, together with distributions of corresponding taxonomy and niches. The set of alk-genes (including but not limited to alkBGHJ) located near each other on a DNA sequence was defined as an alk-gene cluster in this study. The alkB genes with alkGHJ located nearby on a DNA sequence were picked up for the investigation of putative alk-clusters.
Results
A total of 120 alk-gene clusters were found in 117 genomes. All the 117 genomes are from strains located only in α- and γ-proteobacteria. The alkB genes located in alk-gene sets were clustered into a deeply branched mono-clade. Further analysis showed similarity organization types of alk-genes were observed within closely related species. Although a large number of IS elements were observed nearby, they did not lead to the wide spread of the alk-gene cluster. The uneven distribution of these elements indicated that there might be other factors affecting the transmission of alk-gene clusters.
Conclusions
We conducted systematic bioinformatics research on alk-genes located near each other on a DNA sequence. This benchmark dataset of alk-genes can provide base line for exploring its evolutional and ecological importance in future studies.
Summary
Temperature is very important for the growth of microorganisms. Appropriate temperature conditions can improve the possibility for isolation of currently uncultured microorganisms. The development of metagenomic binning technology had dramatically increased the availability of genomic information of prokaryotes, providing convenience to infer the optimal growth temperature (OGT). Here, we proposed CnnPOGTP, a predictor for OGTs of prokaryotes based on deep learning method using only k-mers distribution derived from genomic sequence. This method was annotation free, and the predicted OGT could be obtained by simply providing the genome sequence to the CnnPOGTP website.
Availability and implementation
http://www.orgene.net/CnnPOGTP.
Supplementary information
Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.