Dimensionality reduction algorithms are commonly used for reducing the dimension of multi-dimensional data to visualize them on a standard display. Although many dimensionality reduction algorithms such as the t-distributed Stochastic Neighborhood Embedding aim to preserve close neighborhoods in low-dimensional space, they might not accomplish that for every sample of the data and eventually produce erroneous representations. In this study, we developed a supervised confidence estimation algorithm for detecting erroneous samples in embeddings. Our algorithm generates a confidence score for each sample in an embedding based on a distance-oriented score and a random forest regressor. We evaluate its performance on both intra- and inter-domain data and compare it with the neighborhood preservation ratio as our baseline. Our results showed that the resulting confidence score provides distinctive information about the correctness of any sample in an embedding compared to the baseline. The source code is available at https://github.com/gsaygili/dimred.
Ventricles of the human brain enlarge with aging, neurodegenerative diseases, intrinsic, and extrinsic pathologies. The morphometric examination of neuroimages is an effective approach to assess structural changes occurring due to diseases such as hydrocephalus. In this study, we explored the effectiveness of commonly used morphological parameters in hydrocephalus diagnosis. For this purpose, the effect of six common morphometric parameters; Frontal Horns' Length (FHL), Maximum Lateral Length (MLL), Biparietal Diameter (BPD), Evans' Ratio (ER), Cella Media Ratio (CMR), and Frontal Horns' Ratio (FHR) were compared in terms of their importance in predicting hydrocephalus using a Random Forest classifier. The experimental results demonstrated that hydrocephalus can be detected with 91.46 % accuracy using all of these measurements. The accuracy of classification using only CMR and FHL reached up to 93.33 %. In terms of individual performances, CMR and FHL were the top performers whereas BPD and FHR did not contribute as much to the overall accuracy.
Deep learning (DL) algorithms have achieved important successes in data analysis tasks, thanks to their capability of revealing complex patterns in data. With the advance of new sensors, data storage, and processing hardware, DL algorithms start dominating various fields including neuropsychiatry. There are many types of DL algorithms for different data types from survey data to functional magnetic resonance imaging scans. Because of limitations in diagnosing, estimating prognosis and treatment response of neuropsychiatric disorders; DL algorithms are becoming promising approaches. In this review, we aim to summarize the most common DL algorithms and their applications in neuropsychiatry and also provide an overview to guide the researchers in choosing the proper DL architecture for their research.
Arguably one of the most famous dimensionality reduction algorithms of today is t-distributed stochastic neighbor embedding (t-SNE). Although being widely used for the visualization of scRNA-seq data, it is prone to errors as any algorithm and may lead to inaccurate interpretations of the visualized data. A reasonable way to avoid misinterpretations is to quantify the reliability of the visualizations. The focus of this work is first to find the best possible way to predict sample-based confidence scores for t-SNE embeddings and next, to use these confidence scores to improve the clustering algorithms. We adopt an RF regression algorithm using seven distance measures as features for having the sample-based confidence scores with a variety of different distance measures. The best configuration is used to assess the clustering improvement using K-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) based on Adjusted Rank Index (ARI), Normalized Mutual Information (NMI), and accuracy (ACC) scores. The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm. Our findings reveal the usefulness of these confidence scores on downstream analyses for scRNA-seq data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.