Jinghong Yu scite author profile

Reference genomes are essential for metagenomic analyses and functional characterization of the human gut microbiota. We present the Culturable Genome Reference (CGR), a collection of 1,520 nonredundant, high-quality draft genomes generated from >6,000 bacteria cultivated from fecal samples of healthy humans. Of the 1,520 genomes, which were chosen to cover all major bacterial phyla and genera in the human gut, 264 are not represented in existing reference genome catalogs. We show that this increase in the number of reference bacterial genomes improves the rate of mapping metagenomic sequencing reads from 50% to >70%, enabling higher-resolution descriptions of the human gut microbiome. We use the CGR genomes to annotate functions of 338 bacterial species, showing the utility of this resource for functional studies. We also carry out a pan-genome analysis of 38 important human gut species, which reveals the diversity and specificity of functional enrichment between their core and dispensable genomes.

show abstract

Two distinct metacommunities characterize the gut microbiota in Crohn's disease patients

Gao

Jie

et al. 2017

View full text Add to dashboard Cite

The inflammatory intestinal disorder Crohn's disease (CD) has become a health challenge worldwide. The gut microbiota closely interacts with the host immune system, but its functional impact in CD is unclear. Except for studies on a small number of CD patients, analyses of the gut microbiota in CD have used 16S rDNA amplicon sequencing. Here we employed metagenomic shotgun sequencing to provide a detailed characterization of the compositional and functional features of the CD microbiota, comprising also unannotated bacteria, and investigated its modulation by exclusive enteral nutrition. Based on signature taxa, CD microbiotas clustered into 2 distinct metacommunities, indicating individual variability in CD microbiome structure. Metacommunity-specific functional shifts in CD showed enrichment in producers of the pro-inflammatory hexa-acylated lipopolysaccharide variant and a reduction in the potential to synthesize short-chain fatty acids. Disruption of ecological networks was evident in CD, coupled with reduction in growth rates of many bacterial species. Short-term exclusive enteral nutrition elicited limited impact on the overall composition of the CD microbiota, although functional changes occurred following treatment. The microbiotas in CD patients can be stratified into 2 distinct metacommunities, with the most severely perturbed metacommunity exhibiting functional potentials that deviate markedly from that of the healthy individuals, with possible implication in relation to CD pathogenesis.

show abstract

Fast batch searching for protein homology based on compression and clustering

Sun

2017

BMC Bioinformatics

View full text Add to dashboard Cite

BackgroundIn bioinformatics community, many tasks associate with matching a set of protein query sequences in large sequence datasets. To conduct multiple queries in the database, a common used method is to run BLAST on each original querey or on the concatenated queries. It is inefficient since it doesn’t exploit the common subsequences shared by queries.ResultsWe propose a compression and cluster based BLASTP (C2-BLASTP) algorithm to further exploit the joint information among the query sequences and the database. Firstly, the queries and database are compressed in turn by procedures of redundancy analysis, redundancy removal and distinction record. Secondly, the database is clustered according to Hamming distance among the subsequences. To improve the sensitivity and selectivity of sequence alignments, ten groups of reduced amino acid alphabets are used. Following this, the hits finding operator is implemented on the clustered database. Furthermore, an execution database is constructed based on the found potential hits, with the objective of mitigating the effect of increasing scale of the sequence database. Finally, the homology search is performed in the execution database. Experiments on NCBI NR database demonstrate the effectiveness of the proposed C2-BLASTP for batch searching of homology in sequence database. The results are evaluated in terms of homology accuracy, search speed and memory usage.ConclusionsIt can be seen that the C2-BLASTP achieves competitive results as compared with some state-of-the-art methods.

show abstract

Discovery of DNA Motif Utilising an Integrated Strategy Based on Random Projection and Particle Swarm Optimization

Sun

et al. 2019

Mathematical Problems in Engineering

View full text Add to dashboard Cite

During the process of gene expression and regulation, the DNA genetic information can be transferred to protein by means of transcription. The recognition of transcription factor binding sites can help to understand the evolutionary relations among different sequences. Thus, the problem of recognition of transcription factor binding sites, i.e., motif recognition, plays an important role for understanding the biological functions or meanings of sequences. However, when the established search space processes much noise subsequences, many optimization algorithms tend to be trapped into local optimum. In order to solve this problem, a particle swarm optimization and random projection-based algorithm (PSORPS) is proposed for recognizing DNA motifs. First, a random projection strategy is employed to filter the noise subsequences for constructing the objective space. Moreover, the sequence segments distributed in the majority of DNA sequences can be obtained and used for the population initialization of PSO. Then, the motifs of DNA sequences can be automatically searched by using a designed PSO algorithm in the constructed l-mer objective space. Finally, to alleviate the base deviation and further improve the recognition accuracy, the two operators of associated drift and independent drift are performed on the optimization results obtained by PSO. The experiments are conducted on real-world biological datasets, and the experimental results verify the effectiveness of the proposed algorithm.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jinghong Yu

1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses

Two distinct metacommunities characterize the gut microbiota in Crohn's disease patients

Fast batch searching for protein homology based on compression and clustering

Discovery of DNA Motif Utilising an Integrated Strategy Based on Random Projection and Particle Swarm Optimization

Contact Info

Product

Resources

About