Troy Hernandez scite author profile

Troy Hernandez

4Publications

83Citation Statements Received

55Citation Statements Given

How they've been cited

108

How they cite others

Affiliations

Tsinghua University, University of Illinois at Chicago, Systems Analytics (United States)

Publications

Order By: Most citations

Real Time Classification of Viruses in 12 Dimensions

Hernandez

Zheng

et al. 2013

PLoS ONE

View full text Add to dashboard Cite

The International Committee on Taxonomy of Viruses authorizes and organizes the taxonomic classification of viruses. Thus far, the detailed classifications for all viruses are neither complete nor free from dispute. For example, the current missing label rates in GenBank are 12.1% for family label and 30.0% for genus label. Using the proposed Natural Vector representation, all 2,044 single-segment referenced viral genomes in GenBank can be embedded in . Unlike other approaches, this allows us to determine phylogenetic relations for all viruses at any level (e.g., Baltimore class, family, subfamily, genus, and species) in real time. Additionally, the proposed graphical representation for virus phylogeny provides a visualization of the distribution of viruses in . Unlike the commonly used tree visualization methods which suffer from uniqueness and existence problems, our representation always exists and is unique. This approach is successfully used to predict and correct viral classification information, as well as to identify viral origins; e.g. a recent public health threat, the West Nile virus, is closer to the Japanese encephalitis antigenic complex based on our visualization. Based on cross-validation results, the accuracy rates of our predictions are as high as 98.2% for Baltimore class labels, 96.6% for family labels, 99.7% for subfamily labels and 97.2% for genus labels.

show abstract

Global comparison of multiple-segmented viruses in 12-dimensional genome space

Huang

Zheng

et al. 2014

Molecular Phylogenetics and Evolution

View full text Add to dashboard Cite

Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses

Hernandez

Yang

2016

Journal of Computational Biology

View full text Add to dashboard Cite

The typical process for classifying and submitting a newly sequenced virus to the NCBI database involves two steps. First, a BLAST search is performed to determine likely family candidates. That is followed by checking the candidate families with the pairwise sequence alignment tool for similar species. The submitter's judgment is then used to determine the most likely species classification. The aim of this article is to show that this process can be automated into a fast, accurate, one-step process using the proposed alignment-free method and properly implemented machine learning techniques. We present a new family of alignment-free vectorizations of the genome, the generalized vector, that maintains the speed of existing alignment-free methods while outperforming all available methods. This new alignment-free vectorization uses the frequency of genomic words (k-mers), as is done in the composition vector, and incorporates descriptive statistics of those k-mers' positional information, as inspired by the natural vector. We analyze five different characterizations of genome similarity using k-nearest neighbor classification and evaluate these on two collections of viruses totaling over 10,000 viruses. We show that our proposed method performs better than, or as well as, other methods at every level of the phylogenetic hierarchy. The data and R code is available upon request.

show abstract

Flex Scheduling for Bus Arrival Time Prediction

Hernandez

2014

Transportation Research Record

View full text Add to dashboard Cite

The prediction of bus arrival times is an important element for travel planning. This study used three weeks of Chicago, Illinois, Transit Authority bus route GPS data to compare the performance of several commonly used methods and algorithms. The use of implicit schedules in previous papers was inadequate. The use of additional information, such as recent travel times along the route, is unnecessary. In addition, the use of computationally intensive machine learning algorithms, such as support vector regression, k nearest neighbor regression, and neural networks, is unnecessary. The study used basis expansion functions at various resolutions with linear models and cross-validated the models to determine explicitly the approximate historical interstop travel times for any time of the day and any day of the week. Combining the estimated interstop travel times with the real-time GPS location of a bus resulted in a flex schedule that was independent of scheduled departure or arrival times. Using a flex schedule makes the use of additional GPS information or the use of the machine learning algorithms unnecessary.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Troy Hernandez

Real Time Classification of Viruses in 12 Dimensions

Global comparison of multiple-segmented viruses in 12-dimensional genome space

Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses

Flex Scheduling for Bus Arrival Time Prediction

Contact Info

Product

Resources

About