Geptop has performed effectively in the identification of prokaryotic essential genes since its first release in 2013. It estimates gene essentiality for prokaryotes based on orthology and phylogeny. Genome-scale essentiality data of more prokaryotic species are available, and the information has been collected into public essential gene repositories such as DEG and OGEE. A faster and more accurate toolkit is needed to meet the increasing prokaryotic genome data. We updated Geptop by supplementing more validated essentiality data into reference set (from 19 to 37 species), and introducing multi-process technology to accelerate the computing speed. Compared with Geptop 1.0 and other gene essentiality prediction models, Geptop 2.0 can generate more stable predictions and finish the computation in a shorter time. The software is available both as an online server and a downloadable standalone application. We hope that the improved Geptop 2.0 will facilitate researches in gene essentiality and the development of novel antibacterial drugs. The gene essentiality prediction tool is available at
http://cefg.uestc.cn/geptop
.
Anti-CRISPR proteins (Acrs) can suppress the activity of CRISPR-Cas systems. Some viruses depend on Acrs to expand their genetic materials into the host genome which can promote species diversity. Therefore, the identification and determination of Acrs are of vital importance. In this work we developed a random forest tree-based tool, AcrDetector, to identify Acrs in the whole genomescale using merely six features. AcrDetector can achieve a mean accuracy of 99.65%, a mean recall of 75.84%, a mean precision of 99.24% and a mean F1 score of 85.97%; in multi-round, 5-fold crossvalidation (30 different random states). To demonstrate that AcrDetector can identify real Acrs precisely at the whole genome-scale we performed a cross-species validation which resulted in 71.43% of real Acrs being ranked in the top 10. We applied AcrDetector to detect Acrs in the latest data. It can accurately identify 3 Acrs, which have previously been verified experimentally. A standalone version of AcrDetector is available at https://github.com/RiversDong/AcrDetector. Additionally, our result showed that most of the Acrs are transferred into their host genomes in a recent stage rather than early.
Alzheimer’s disease (AD) is a neurodegenerative disease that eventually affects memory and behavior. The identification of biomarkers based on risk factors for AD provides insight into the disease since the exact cause of AD remains unknown. Several studies have proposed microRNAs (miRNAs) in blood as potential biomarkers for AD. Exposure to heavy metals is a potential risk factor for onset and development of AD. Blood cells of subjects that are exposed to lead detected in the circulatory system, potentially reflect molecular responses to this exposure that are similar to the response of neurons. In this study we analyzed blood cell-derived miRNAs derived from a general population as proxies of potentially AD-related mechanisms triggered by lead exposure. Subsequently, we analyzed these mechanisms in the brain tissue of AD subjects and controls. A total of four miRNAs were identified as lead exposure-associated with hsa-miR-3651, hsa-miR-150-5p and hsa-miR-664b-3p being negatively and hsa-miR-627 positively associated. In human brain derived from AD and AD control subjects all four miRNAs were detected. Moreover, two miRNAs (miR-3651, miR-664b-3p) showed significant differential expression in AD brains versus controls, in accordance with the change direction of lead exposure. The miRNAs’ gene targets were validated for expression in the human brain and were found enriched in AD-relevant pathways such as axon guidance. Moreover, we identified several AD relevant transcription factors such as CREB1 associated with the identified miRNAs. These findings suggest that the identified miRNAs are involved in the development of AD and might be useful in the development of new, less invasive biomarkers for monitoring of novel therapies or of processes involved in AD development.
To better understand the mechanisms of bacterial adaptation in oxygen environments, we explored the aerobic living-associated genes in bacteria by comparing Clusters of Orthologous Groups of proteins' (COGs) frequencies and gene expression analyses and 38 COGs were detected at significantly higher frequencies (p-value less than 1e-6) in aerobes than in anaerobes. Differential expression analyses between two conditions further narrowed the prediction to 27 aerobe-specific COGs. Then, we annotated the enzymes associated with these COGs. Literature review revealed that 14 COGs contained enzymes catalysing oxygen-involved reactions or products involved in aerobic pathways, suggesting their important roles for survival in aerobic environments. Additionally, protein-protein interaction analyses and step length comparisons of metabolic networks suggested that the other 13 COGs may function relevantly with the 14 enzymes-corresponding COGs, indicating that these genes may be highly associated with oxygen utilization. Phylogenetic and evolutionary analyses showed that the 27 COGs did not have similar trees, and all suffered purifying selection pressures. The divergent times of species containing or lacking aerobic COGs validated that the appearing time of oxygen-utilizing gene was approximately 2.80 Gyr ago. In addition to help better understand oxygen adaption, our method may be extended to identify genes relevant to other living environments.
In prokaryotes, Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR‐associated protein (Cas) systems constitute adaptive immune systems against mobile genetic elements (MGEs). Here, we introduce the Markov cluster algorithm (MCL) to Makarova et al.'s method in order to select a more reasonable profile. Additionally, our new Maximum Continuous Cas Subcluster (MCCS) method helps identification of tightly clustered loci. The comparison with two other commonly used programs shows that the method could identify Cas proteins with higher accuracy and lower Additional Prediction Rate (APR). Moreover, we developed a web‐based server, CasLocusAnno (http://cefg.uestc.cn/CasLocusAnno), capable of annotating Cas proteins, cas loci and their (sub)types less than ~ 28 s following the whole proteome sequence submission. Its standalone version can be downloaded at https://github.com/RiversDong/CasLocusAnno.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.