Rui Nie scite author profile

¹

,

Breeschoten

²

,

Timmermans

³

et al. 2017

With efficient sequencing techniques, full mitochondrial genomes are rapidly replacing other widely used markers, such as the nuclear rRNA genes, for phylogenetic analysis but their power to resolve deep levels of the tree remains controversial. We studied phylogenetic relationships of leaf beetles (Chrysomelidae) in the tribes Galerucini and Alticini (root worms and flea beetles) based on full mitochondrial genomes (103 newly sequenced), and compared their performance to the widely sequenced nuclear rRNA genes (full 18S, partial 28S). Our results show that: (i) the mitogenome is phylogenetically informative from subtribe to family level, and the per‐nucleotide contribution to nodal support is higher than that of rRNA genes, (ii) the Galerucini and Alticini are reciprocally monophyletic sister groups, if the classification is adjusted to accommodate several ‘problematic genera’ that do not fit the dichotomy of lineages based on the presence (Alticini) or absence (Galerucini) of the jumping apparatus, and (iii) the phylogenetic results suggest a new classification system of Galerucini with eight subtribes: Oidina, Galerucina, Hylaspina, Metacyclina, Luperina, Aulacophorina, Diabroticina and Monoleptina.

An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data

Wang

¹

,

²

,

Yu

³

et al. 2020

Recently deep learning methods have been applied to process biological data and greatly pushed the development of the biological research forward. However, the interpretability of the deep learning methods still needs to improve. Here for the first time, we present scCapsNet, a totally interpretable deep learning model adapted from CapsNet. The scCapsNet model retains the capsule parts of CapsNet but replaces the part of convolutional neural networks with several parallel fully connected neural networks. We apply scCapsNet to scRNA-seq data. The results show that scCapsNet performs well as a classifier and also that the parallel fully connected neural networks function like feature extractors as we supposed. The scCapsNet model provides contribution of each extracted feature to the cell type recognition. Evidences show that some extracted features are nearly orthogonal to each other. After training, through analysis of the internal weights of each neural network connected inputs and primary capsule, and with the information about the contribution of each extracted feature to the cell type recognition, the scCapsNet model could relate gene sets from inputs to cell types. The specific gene set is responsible for the identification of its corresponding cell types but does not affect the recognition of other cell types by the model. Many well-studied cell type markers are in the gene set with corresponding cell type. The internal weights of neural network for those well-studied cell type markers are different for different primary capsules.The internal weights of neural network connected to a primary capsule could be viewed as an embedding for genes, convert genes to real value low dimensional vectors. Furthermore, we mix the RNA expression data of two cells with different cell types and then use the scCapsNet model trained with non-mixed data to predict the cell types in the mixed data.Our scCapsNet model could predict cell types in a cell mixture with high accuracy.

Comparative transcriptome analysis of chemosensory genes in two sister leaf beetles provides insights into chemosensory speciation

¹

,

²

,

Insect Biochemistry and Molecular Biology

³

et al. 2016

MultiCapsNet: A General Framework for Data Integration and Interpretable Classification

Wang

¹

,

Miao

²

,

³

et al. 2022

The latest progresses of experimental biology have generated a large number of data with different formats and lengths. Deep learning is an ideal tool to deal with complex datasets, but its inherent “black box” nature needs more interpretability. At the same time, traditional interpretable machine learning methods, such as linear regression or random forest, could only deal with numerical features instead of modular features often encountered in the biological field. Here, we present MultiCapsNet (https://github.com/wanglf19/MultiCapsNet), a new deep learning model built on CapsNet and scCapsNet, which possesses the merits such as easy data integration and high model interpretability. To demonstrate the ability of this model as an interpretable classifier to deal with modular inputs, we test MultiCapsNet on three datasets with different data type and application scenarios. Firstly, on the labeled variant call dataset, MultiCapsNet shows a similar classification performance with neural network model, and provides importance scores for data sources directly without an extra importance determination step required by the neural network model. The importance scores generated by these two models are highly correlated. Secondly, on single cell RNA sequence (scRNA-seq) dataset, MultiCapsNet integrates information about protein-protein interaction (PPI), and protein-DNA interaction (PDI). The classification accuracy of MultiCapsNet is comparable to the neural network and random forest model. Meanwhile, MultiCapsNet reveals how each transcription factor (TF) or PPI cluster node contributes to classification of cell type. Thirdly, we made a comparison between MultiCapsNet and SCENIC. The results show several cell type relevant TFs identified by both methods, further proving the validity and interpretability of the MultiCapsNet.

Adaptation to different host plant ages facilitates insect divergence without a host shift

¹

,

Segraves

²

,

Xue

³

et al. 2015

Host shifts and subsequent adaption to novel host plants are important drivers of speciation among phytophagous insects. However, there is considerably less evidence for host plant-mediated speciation in the absence of a host shift. Here, we investigated divergence of two sympatric sister elm leaf beetles, Pyrrhalta maculicollis and P. aenescens, which feed on different age classes of the elm Ulmus pumila L. (seedling versus adult trees). Using a field survey coupled with preference and performance trials, we show that these beetle species are highly divergent in both feeding and oviposition preference and specialize on either seedling or adult stages of their host plant. An experiment using artificial leaf discs painted with leaf surface wax extracts showed that host plant chemistry is a critical element that shapes preference. Specialization appears to be driven by adaptive divergence as there was also evidence of divergent selection; beetles had significantly higher survival and fecundity when reared on their natal host plant age class. Together, the results identify the first probable example of divergence induced by host plant age, thus extending how phytophagous insects might diversify in the absence of host shifts.

The draft genome of the specialist flea beetle Altica viridicyanea (Coleoptera: Chrysomelidae)

Xue

¹

,

Niu

²

,

Segraves

³

et al. 2021

Background Altica (Coleoptera: Chrysomelidae) is a highly diverse and taxonomically challenging flea beetle genus that has been used to address questions related to host plant specialization, reproductive isolation, and ecological speciation. To further evolutionary studies in this interesting group, here we present a draft genome of a representative specialist, Altica viridicyanea, the first Alticinae genome reported thus far. Results The genome is 864.8 Mb and consists of 4490 scaffolds with a N50 size of 557 kb, which covered 98.6% complete and 0.4% partial insect Benchmarking Universal Single-Copy Orthologs. Repetitive sequences accounted for 62.9% of the assembly, and a total of 17,730 protein-coding gene models and 2462 non-coding RNA models were predicted. To provide insight into host plant specialization of this monophagous species, we examined the key gene families involved in chemosensation, detoxification of plant secondary chemistry, and plant cell wall-degradation. Conclusions The genome assembled in this work provides an important resource for further studies on host plant adaptation and functionally affiliated genes. Moreover, this work also opens the way for comparative genomics studies among closely related Altica species, which may provide insight into the molecular evolutionary processes that occur during ecological speciation.

Contact cuticular hydrocarbons act as a mating cue to discriminate intraspecific variation in Altica flea beetles

Xue

¹

,

²

,

Segraves

³

et al. 2016

scCapsNet: a deep learning classifier with the capability of interpretable feature extraction, applicable for single cell RNA data analysis

Wang

¹

,

²

,

Xin

³

et al. 2018

Preprint

Recently deep learning methods have been applied to process biological data and greatly pushed the development of the biological research forward. However, the interpretability of the deep learning methods still needs to improve. Here for the first time, we present scCapsNet, a totally interpretable deep learning model adapted from CapsNet. The scCapsNet model retains the capsule parts of CapsNet but replaces the part of convolutional neural networks with several parallel fully connected neural networks. We apply scCapsNet to scRNA-seq data. The resultsshow that scCapsNet performs well as a classifier and also that the parallel fully connected neural networks function like feature extractors as we supposed. The scCapsNet model provides contribution of each extracted feature to the cell type recognition. Evidences show that some extracted features are nearly orthogonal to each other. After training, through analysis of the internal weights of each neural network connected inputs and primary capsule, and with the information about the contribution of each extracted feature to the cell type recognition, the scCapsNet model could relate gene sets from inputs to cell types. The specific gene set is responsible for the identification of its corresponding cell types but does not affect the recognition of other cell types by the model. Many well-studied cell type markers are in the gene set with corresponding cell type. The internal weights of neural network for those well-studied cell type markers are different for different primary capsules.The internal weights of neural network connected to a primary capsule could be viewed as an embedding for genes, convert genes to real value low dimensional vectors. Furthermore, we mix the RNA expression data of two cells with different cell types and then use the scCapsNet model trained with non-mixed data to predict the cell types in the mixed data.Our scCapsNet model could predict cell types in a cell mixture with high accuracy.Single Cell RNA sequencing (scRNA-seq) could measure gene expression levels in individual cells. Using scRNA-seq data, it is possible to reveal heterogeneity in a cell population [1,2], identify new cell types, computationally order cells along trajectories [3,4], and infer the spatial coordinates of every individual cell in a population [5,6].As the scRNA-seq data accumulates quickly, it is important to retrieve similar cell types. For example, scMCA suggested a pipeline for cell type determination by comparing the input single-cell transcriptome with pre-calculated reference transcriptome to provide a match score based on gene expression correlation [7]. Since many cell types have already been well defined, supervised learning is an ideal tool to classify undefined cells. Besides the final goal of classification or similar cell type retrieval, the interpretability of the classification process is also important. By demonstrating which features are extracted for obtaining a specific decision and how these features contribute to the decision, the classifi...