Minzhu Xie scite author profile

Motivation: Some economically important plants including wheat and cotton have more than two copies of each chromosome. With the decreasing cost and increasing read length of next-generation sequencing technologies, reconstructing the multiple haplotypes of a polyploid genome from its sequence reads becomes practical. However, the computational challenge in polyploid haplotyping is much greater than that in diploid haplotyping, and there are few related methods. Results: This paper models the polyploid haplotyping problem as an optimal poly-partition problem of the reads, called the Polyploid Balanced Optimal Partition (PBOP) model. For the reads sequenced from a k -ploid genome, the model tries to divide the reads into k groups such that the difference between the reads of the same group is minimized while the difference between the reads of different groups is maximized. When the genotype information is available, the model is extended to the Polyploid Balanced Optimal Partition with Genotype constraint (PBOPG) problem. These models are all NP-hard. We propose two heuristic algorithms, H-PoP and H-PoPG, based on dynamic programming and a strategy of limiting the number of intermediate solutions at each iteration, to solve the two models, respectively. Extensive experimental results on simulated and real data show that our algorithms can solve the models effectively, and are much faster and more accurate than the recent state-of-the-art polyploid haplotyping algorithms. The experiments also show that our algorithms can deal with long reads and deep read coverage effectively and accurately. Furthermore, H-PoP might be applied to help determine the ploidy of an organism. Availability: https://github.com/MinzhuXie/H-PoPG Contact:

show abstract

Detecting genome-wide epistases based on the clustering of relatively frequent items

Xie

Jiang

2011

View full text Add to dashboard Cite

show abstract

Co-clustering phenome–genome for phenotype classification and disease gene discovery

Hwang

Atluri

Xie

et al. 2012

View full text Add to dashboard Cite

Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype–gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype–gene association matrix under the prior knowledge from phenotype similarity network and protein–protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype–gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein–protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways.

show abstract

XGBFEMF: An XGBoost-Based Framework for Essential Protein Prediction

Zhong

Sun

Peng

et al. 2018

IEEE Trans.on Nanobioscience

107

View full text Add to dashboard Cite

Essential proteins as a vital part of maintaining the cells' life play an important role in the study of biology and drug design. With the generation of large amounts of biological data related to essential proteins, an increasing number of computational methods have been proposed. Different from the methods which adopt a single machine learning method or an ensemble machine learning method, this paper proposes a predicting framework named by XGBFEMF for identifying essential proteins, which includes a SUB-EXPAND-SHRINK method for constructing the composite features with original features and obtaining the better subset of features for essential protein prediction, and also includes a model fusion method for getting a more effective prediction model. We carry out experiments on Yeast data to assess the performance of the XGBFEMF with ROC analysis, accuracy analysis, and top analysis. Meanwhile, we set up experiments on E. coli data for the validation of performance. The test results show that the XGBFEMF framework can effectively improve many essential indicators. In addition, we analyze each step in the XGBFEMF framework; our results show that both each step of the SUB-EXPAND-SHRINK method as well as the step of multi-model fusion can improve prediction performance.

show abstract

Prioritizing Disease Genes by Bi-Random Walk

Xie

Hwang

Kuang

2012

View full text Add to dashboard Cite

Abstract. Random walk methods have been successfully applied to prioritizing disease causal genes. In this paper, we propose a bi-random walk algorithm (BiRW) based on a regularization framework for graph matching to globally prioritize disease genes for all phenotypes simultaneously. While previous methods perform random walk either on the proteinprotein interaction network or the complete phenome-genome heterogenous network, BiRW performs random walk on the Kronecker product graph between the protein-protein interaction network and the phenotype similarity network. Three variations of BiRW that perform balanced or unbalanced bi-directional random walks are analyzed and compared with other random walk methods. Experiments on analyzing the disease phenotype-gene associations in Online Mendelian Inheritance in Man (OMIM) demonstrate that BiRW effectively improved disease gene prioritization over existing methods by ranking more known associations in the top 100 out of nearly 10,000 candidate genes.

show abstract

Inferring disease and gene set associations with rank coherence in networks

Hwang

Zhang

Xie

et al. 2011

View full text Add to dashboard Cite

show abstract

Recovery of valuable metals from spent lithium ion batteries by smelting reduction process based on FeO–SiO 2 –Al 2 O 3 slag system

Ren

Xiao

Xie

et al. 2017

Transactions of Nonferrous Metals Society of China

View full text Add to dashboard Cite

A plasticity model for unidirectional composite materials and its applications in modeling composites testing

Xie

Adams

1995

Composites Science and Technology

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Minzhu Xie

H-PoP and H-PoPG: heuristic partitioning algorithms for single individual haplotyping of polyploids

Detecting genome-wide epistases based on the clustering of relatively frequent items

Co-clustering phenome–genome for phenotype classification and disease gene discovery

XGBFEMF: An XGBoost-Based Framework for Essential Protein Prediction

Prioritizing Disease Genes by Bi-Random Walk

Inferring disease and gene set associations with rank coherence in networks

Recovery of valuable metals from spent lithium ion batteries by smelting reduction process based on FeO–SiO 2 –Al 2 O 3 slag system

A plasticity model for unidirectional composite materials and its applications in modeling composites testing

Contact Info

Product

Resources

About