It is critical, but difficult, to catch the small variation in genomic or other kinds of data that differentiates phenotypes or categories. A plethora of data is available, but the information from its genes or elements is spread over arbitrarily, making it challenging to extract relevant details for identification. However, an arrangement of similar genes into clusters makes these differences more accessible and allows for robust identification of hidden mechanisms (e.g. pathways) than dealing with elements individually. Here we propose, DeepInsight, which converts non-image samples into a well-organized image-form. Thereby, the power of convolution neural network (CNN), including GPU utilization, can be realized for non-image samples. Furthermore, DeepInsight enables feature extraction through the application of CNN for non-image samples to seize imperative information and shown promising results. To our knowledge, this is the first work to apply CNN simultaneously on different kinds of non-image datasets: RNA-seq, vowels, text, and artificial.
Artificial intelligence methods offer exciting new capabilities for the discovery of biological mechanisms from raw data because they are able to detect vastly more complex patterns of association that cannot be captured by classical statistical tests. Among these methods, deep neural networks are currently among the most advanced approaches and, in particular, convolutional neural networks (CNNs) have been shown to perform excellently for a variety of difficult tasks. Despite that applications of this type of networks to high-dimensional omics data and, most importantly, meaningful interpretation of the results returned from such models in a biomedical context remains an open problem. Here we present, an approach applying a CNN to nonimage data for feature selection. Our pipeline, DeepFeature, can both successfully transform omics data into a form that is optimal for fitting a CNN model and can also return sets of the most important genes used internally for computing predictions. Within the framework, the Snowfall compression algorithm is introduced to enable more elements in the fixed pixel framework, and region accumulation and element decoder is developed to find elements or genes from the class activation maps. In comparative tests for cancer type prediction task, DeepFeature simultaneously achieved superior predictive performance and better ability to discover key pathways and biological processes meaningful for this context. Capabilities offered by the proposed framework can enable the effective use of powerful deep learning methods to facilitate the discovery of causal mechanisms in high-dimensional biomedical data.
Motivation Advances in next-generation sequencing have made it possible to carry out transcriptomic studies at single-cell resolution and generate vast amounts of single-cell RNA sequencing (RNA-seq) data rapidly. Thus, tools to analyze this data need to evolve as well as to improve accuracy and efficiency. Results We present FEATS, a Python software package, that performs clustering on single-cell RNA-seq data. FEATS is capable of performing multiple tasks such as estimating the number of clusters, conducting outlier detection and integrating data from various experiments. We develop a univariate feature selection-based approach for clustering, which involves the selection of top informative features to improve clustering performance. This is motivated by the fact that cell types are often manually determined using the expression of only a few known marker genes. On a variety of single-cell RNA-seq datasets, FEATS gives superior performance compared with the current tools, in terms of adjusted Rand index and estimating the number of clusters. It achieves a 22% improvement in clustering and more accurately estimates the number of clusters when compared with other tools. In addition to cluster estimation, FEATS also performs outlier detection and data integration while giving an excellent computational performance. Thus, FEATS is a comprehensive clustering tool capable of addressing the challenges during the clustering of single-cell RNA-seq data. Availability The installation instructions and documentation of FEATS is available at https://edwinv87.github.io/feats/. Supplementary Data Supplementary data are available online at https://academic.oup.com/bib.
Identifying smaller element or gene subsets from biological or other data types is an essential step in discovering underlying mechanisms. Statistical machine learning methods have played a key role in revealing gene subsets. However, growing data complexity is pushing the limits of these techniques. A review of the recent literature shows that arranging elements by similarity in image-form for a convolutional neural network (CNN) improves classification performance over treating them individually. Expanding on this, here we show a pipeline, DeepInsight-FS, to uncover gene subsets of clinical relevance. DeepInsight-FS converts non-image samples into image-form and performs element selection via CNN. To our knowledge, this is the first approach to employ CNN for element or gene selection on non-image data. A real world application of DeepInsight-FS to publicly available cancer data identified gene sets with significant overlap to several cancer-associated pathways suggesting the potential of this method to discover biomedically meaningful connections.
We present FeatClust, a software tool for clustering small sample size single-cell RNA-Seq datasets. The FeatClust approach is based on feature selection. It divides features into several groups by performing agglomerative hierarchical clustering and then iteratively clustering the samples and removing features belonging to groups with the least variance across samples. The optimal number of feature groups is selected based on silhouette analysis on the clustered data, i.e., selecting the clustering with the highest average silhouette coefficient. FeatClust also allows one to visually choose the number of clusters if it is not known, by generating silhouette plot for a chosen number of groupings of the dataset. We cluster five small sample single-cell RNA-seq datasets and use the adjusted rand index metric to compare the results with other clustering packages. The results are promising and show the effectiveness of FeatClust on small sample size datasets.
Advances in next-generation sequencing (NGS) have made it possible to carry out transcriptomic studies at single-cell resolution and generate vast amounts of single-cell RNA-seq data rapidly. Thus, tools to analyze this data need to evolve as well to improve accuracy and efficiency. We present FEATS, a python software package that performs clustering on single-cell RNA-seq data. FEATS is capable of performing multiple tasks such as estimating the number of clusters, conducting outlier detection, and integrating data from various experiments. We develop a univariate feature selection based approach for clustering, which involves the selection of top informative features to improve clustering performance. This is motivated by the fact that cell types are often manually determined using the expression of only a few known marker genes. On a variety of single-cell RNA-seq datasets, FEATS gives superior performance compared to the current tools, in terms of adjusted rand index (ARI) and estimating the number of clusters. In addition to cluster estimation, FEATS also performs outlier detection and data integration while giving an excellent computational performance. Thus, FEATS is a comprehensive clustering tool capable of addressing the challenges during the clustering of single-cell RNA-seq data. The installation instructions and documentation of FEATS is available at https://edwinv87.github.io/feats/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.