Van Hoai Tran scite author profile

Without traditional cultures, metagenomics studies the microorganisms sampled from the environment. In those studies, the binning step results serve as an input for the next step of metagenomic projects such as assembly and annotation. The main challenging issue of this process is due to the lack of explicit features of metagenomic reads, especially in the case of short-read datasets. There are two approaches, namely, supervised and unsupervised learning. Unfortunately, only about 1% of microorganisms in nature is annotated. That can cause problems for supervised approaches when an under-study dataset contains unknown species. It is well-known that the main challenging issue of this process is due to the lack of explicit features of metagenomic reads, especially in the case of short-read datasets. Previous studies usually assumed that reads in a taxonomic label have similar k-mer distributions. Our new method is to use Natural Language Processing (NLP) techniques in generating feature vectors. Additionally, the paper presents a comprehensive unsupervised framework in order to apply different embeddings categorized as notable NLP techniques in topic modeling and sentence embedding. The experimental results present our proposed approach's comparative performance with other previous studies on simulated datasets, showing the feasibility of applying NLP for metagenomic binning. The program can be found at https://github.com/vandinhvyphuong/NLPBimeta.

show abstract

Immediate Velocity Prediction of a Mixed-Traffic Flow in Urban Road Networks with Spatial-Temporal Correlation Analysis

Pham

Lăng

Tran

2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Van Hoai Tran

Virtual machine allocation in cloud computing for minimizing total execution time on each machine

An MILP-based makespan minimization model for single-machine scheduling problem with splitable jobs and availability constraints

A Novel Metagenomic Binning Framework Using NLP Techniques in Feature Extraction

Immediate Velocity Prediction of a Mixed-Traffic Flow in Urban Road Networks with Spatial-Temporal Correlation Analysis

Contact Info

Product

Resources

About