Anjun Ma scite author profile

Single-cell RNA-sequencing (scRNA-Seq) is widely used to reveal the heterogeneity and dynamics of tissues, organisms, and complex diseases, but its analyses still suffer from multiple grand challenges, including the sequencing sparsity and complex differential patterns in gene expression. We introduce the scGNN (single-cell graph neural network) to provide a hypothesis-free deep learning framework for scRNA-Seq analyses. This framework formulates and aggregates cell–cell relationships with graph neural networks and models heterogeneous gene expression patterns using a left-truncated mixture Gaussian model. scGNN integrates three iterative multi-modal autoencoders and outperforms existing tools for gene imputation and cell clustering on four benchmark scRNA-Seq datasets. In an Alzheimer’s disease study with 13,214 single nuclei from postmortem brain tissues, scGNN successfully illustrated disease-related neural development and the differential mechanism. scGNN provides an effective representation of gene expression and cell–cell relationships. It is also a powerful framework that can be applied to general scRNA-Seq analyses.

show abstract

Integrative Methods and Practical Challenges for Single-Cell Multi-omics

McDermaid

et al. 2020

Trends in Biotechnology

156

103

View full text Add to dashboard Cite

SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting

Qiu

Chen

et al. 2019

147

View full text Add to dashboard Cite

Motivation Mitochondria are an essential organelle in most eukaryotes. They not only play an important role in energy metabolism but also take part in many critical cytopathological processes. Abnormal mitochondria can trigger a series of human diseases, such as Parkinson's disease, multifactor disorder and Type-II diabetes. Protein submitochondrial localization enables the understanding of protein function in studying disease pathogenesis and drug design. Results We proposed a new method, SubMito-XGBoost, for protein submitochondrial localization prediction. Three steps are included: (i) the g-gap dipeptide composition (g-gap DC), pseudo-amino acid composition (PseAAC), auto-correlation function (ACF) and Bi-gram position-specific scoring matrix (Bi-gram PSSM) are employed to extract protein sequence features, (ii) Synthetic Minority Oversampling Technique (SMOTE) is used to balance samples, and the ReliefF algorithm is applied for feature selection and (iii) the obtained feature vectors are fed into XGBoost to predict protein submitochondrial locations. SubMito-XGBoost has obtained satisfactory prediction results by the leave-one-out-cross-validation (LOOCV) compared with existing methods. The prediction accuracies of the SubMito-XGBoost method on the two training datasets M317 and M983 were 97.7% and 98.9%, which are 2.8–12.5% and 3.8–9.9% higher than other methods, respectively. The prediction accuracy of the independent test set M495 was 94.8%, which is significantly better than the existing studies. The proposed method also achieves satisfactory predictive performance on plant and non-plant protein submitochondrial datasets. SubMito-XGBoost also plays an important role in new drug design for the treatment of related diseases. Availability and implementation The source codes and data are publicly available at https://github.com/QUST-AIBBDRC/SubMito-XGBoost/. Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Protein–protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique

Wang

et al. 2018

133

View full text Add to dashboard Cite

Motivation The prediction of protein–protein interaction (PPI) sites is a key to mutation design, catalytic reaction and the reconstruction of PPI networks. It is a challenging task considering the significant abundant sequences and the imbalance issue in samples. Results A new ensemble learning-based method, Ensemble Learning of synthetic minority oversampling technique (SMOTE) for Unbalancing samples and RF algorithm (EL-SMURF), was proposed for PPI sites prediction in this study. The sequence profile feature and the residue evolution rates were combined for feature extraction of neighboring residues using a sliding window, and the SMOTE was applied to oversample interface residues in the feature space for the imbalance problem. The Multi-dimensional Scaling feature selection method was implemented to reduce feature redundancy and subset selection. Finally, the Random Forest classifiers were applied to build the ensemble learning model, and the optimal feature vectors were inserted into EL-SMURF to predict PPI sites. The performance validation of EL-SMURF on two independent validation datasets showed 77.1% and 77.7% accuracy, which were 6.2–15.7% and 6.1–18.9% higher than the other existing tools, respectively. Availability and implementation The source codes and data used in this study are publicly available at http://github.com/QUST-AIBBDRC/EL-SMURF/. Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Clustering and classification methods for single-cell RNA-sequencing data

et al. 2019

125

View full text Add to dashboard Cite

Appropriate ways to measure the similarity between single-cell RNA-sequencing (scRNA-seq) data are ubiquitous in bioinformatics, but using single clustering or classification methods to process scRNA-seq data is generally difficult. This has led to the emergence of integrated methods and tools that aim to automatically process specific problems associated with scRNA-seq data. These approaches have attracted a lot of interest in bioinformatics and related fields. In this paper, we systematically review the integrated methods and tools, highlighting the pros and cons of each approach. We not only pay particular attention to clustering and classification methods but also discuss methods that have emerged recently as powerful alternatives, including nonlinear and linear methods and descending dimension methods. Finally, we focus on clustering and classification methods for scRNA-seq data, in particular, integrated methods, and provide a comprehensive description of scRNA-seq data and download URLs.

show abstract

QUBIC2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data

Xie

Zhang

et al. 2019

View full text Add to dashboard Cite

Motivation The biclustering of large-scale gene expression data holds promising potential for detecting condition-specific functional gene modules (i.e. biclusters). However, existing methods do not adequately address a comprehensive detection of all significant bicluster structures and have limited power when applied to expression data generated by RNA-Sequencing (RNA-Seq), especially single-cell RNA-Seq (scRNA-Seq) data, where massive zero and low expression values are observed. Results We present a new biclustering algorithm, QUalitative BIClustering algorithm Version 2 (QUBIC2), which is empowered by: (i) a novel left-truncated mixture of Gaussian model for an accurate assessment of multimodality in zero-enriched expression data, (ii) a fast and efficient dropouts-saving expansion strategy for functional gene modules optimization using information divergency and (iii) a rigorous statistical test for the significance of all the identified biclusters in any organism, including those without substantial functional annotations. QUBIC2 demonstrated considerably improved performance in detecting biclusters compared to other five widely used algorithms on various benchmark datasets from E.coli, Human and simulated data. QUBIC2 also showcased robust and superior performance on gene expression data generated by microarray, bulk RNA-Seq and scRNA-Seq. Availability and implementation The source code of QUBIC2 is freely available at https://github.com/OSU-BMBL/QUBIC2. Contact czhang87@iu.edu or qin.ma@osumc.edu Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Androgen conspires with the CD8 ⁺ T cell exhaustion program and contributes to sex bias in cancer

et al. 2022

View full text Add to dashboard Cite

Sex bias exists in the development and progression of non-reproductive organ cancers, but the underlying mechanisms are enigmatic. Studies so far have focused largely on sexual dimorphisms in cancer biology and socioeconomic factors. Here, we establish a role for CD8 + T cell-dependent anti-tumor immunity in mediating sex differences in tumor aggressiveness, which is driven by the gonadal androgen but not sex chromosomes. A male bias exists in the frequency of intratumoral antigen-experienced Tcf7 /TCF1 + progenitor exhausted CD8 + T cells that are devoid of effector activity as a consequence of intrinsic androgen receptor (AR) function. Mechanistically, we identify a novel sex-specific regulon in progenitor exhausted CD8 + T cells and a pertinent contribution from AR as a direct transcriptional trans-activator of Tcf7 /TCF1. The T cell intrinsic function of AR in promoting CD8 + T cell exhaustion in vivo was established using multiple approaches including loss-of-function studies with CD8-specific Ar knockout mice. Moreover, ablation of the androgen-AR axis rewires the tumor microenvironment to favor effector T cell differentiation and potentiates the efficacy of anti-PD-1 immune checkpoint blockade. Collectively, our findings highlight androgen-mediated promotion of CD8 + T cell dysfunction in cancer and imply broader opportunities for therapeutic development from understanding sex disparities in health and disease.

show abstract

Network analyses in microbiome based on high-throughput multi-omics data

Liu

Mathé

et al. 2020

View full text Add to dashboard Cite

Together with various hosts and environments, ubiquitous microbes interact closely with each other forming an intertwined system or community. Of interest, shifts of the relationships between microbes and their hosts or environments are associated with critical diseases and ecological changes. While advances in high-throughput Omics technologies offer a great opportunity for understanding the structures and functions of microbiome, it is still challenging to analyse and interpret the omics data. Specifically, the heterogeneity and diversity of microbial communities, compounded with the large size of the datasets, impose a tremendous challenge to mechanistically elucidate the complex communities. Fortunately, network analyses provide an efficient way to tackle this problem, and several network approaches have been proposed to improve this understanding recently. Here, we systemically illustrate these network theories that have been used in biological and biomedical research. Then, we review existing network modelling methods of microbial studies at multiple layers from metagenomics to metabolomics and further to multi-omics. Lastly, we discuss the limitations of present studies and provide a perspective for further directions in support of the understanding of microbial communities.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Anjun Ma

scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses

Integrative Methods and Practical Challenges for Single-Cell Multi-omics

SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting

Protein–protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique

Clustering and classification methods for single-cell RNA-sequencing data

QUBIC2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data

Androgen conspires with the CD8 ⁺ T cell exhaustion program and contributes to sex bias in cancer

Network analyses in microbiome based on high-throughput multi-omics data

Contact Info

Product

Resources

About

Anjun Ma

scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses

Integrative Methods and Practical Challenges for Single-Cell Multi-omics

SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting

Protein–protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique

Clustering and classification methods for single-cell RNA-sequencing data

QUBIC2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data

Androgen conspires with the CD8 + T cell exhaustion program and contributes to sex bias in cancer

Network analyses in microbiome based on high-throughput multi-omics data

Contact Info

Product

Resources

About

Androgen conspires with the CD8 ⁺ T cell exhaustion program and contributes to sex bias in cancer