High throughput biological data need to be processed, analyzed, and interpreted to address problems in life sciences. Bioinformatics, computational biology, and systems biology deal with biological problems using computational methods. Clustering is one of the methods used to gain insight into biological processes, particularly at the genomics level. Clearly, clustering can be used in many areas of biological data analysis. However, this paper presents a review of the current clustering algorithms designed especially for analyzing gene expression data. It is also intended to introduce one of the main problems in bioinformatics - clustering gene expression data - to the operations research community.
Persistent homology, a topological data analysis (TDA) method, is applied to microarray data sets. Although there are a few papers referring to TDA methods in microarray analysis, the usage of persistent homology in the comparison of several weighted gene coexpression networks (WGCN) was not employed before to the very best of our knowledge. We calculate the persistent homology of weighted networks constructed from 38 Arabidopsis microarray data sets to test the relevance and the success of this approach in distinguishing the stress factors. We quantify multiscale topological features of each network using persistent homology and apply a hierarchical clustering algorithm to the distance matrix whose entries are pairwise bottleneck distance between the networks. The immunoresponses to different stress factors are distinguishable by our method. The networks of similar immunoresponses are found to be close with respect to bottleneck distance indicating the similar topological features of WGCNs. This computationally efficient technique analyzing networks provides a quick test for advanced studies.
Biological networks, social networks, and the World Wide Web are some examples of real world networks exhibiting community structure. We present a concise review of community structure finding (CSF) algorithms and applications. We apply a CSF algorithm and various other algorithms on three different microarray data sets. We calculate modularity and C-rand indices as an indication of the quality of each clustering of the three data sets. We compare the performance of the CSF algorithm with the performance of three other algorithms: hierarchical clustering (HC) algorithm, K-means, dynamic tree cut (DTC) algorithm and Naive Bayes Clustering (NBC) using both C-rand and modularity values.We report that the CSF algorithm detects clusters resulting in high modularity; however the CSF does not result in clusters with high C-rand values compared to the other methods.
The increasing availability of high temporal resolution neuroimaging data has increased the efforts to understand the dynamics of neural functions. Until recently, there are few studies on generative models supporting classification and prediction of neural systems compared to the description of the architecture. However, the requirement of collapsing data spatially and temporally in the state-of-the art methods to analyze functional magnetic resonance imaging (fMRI), electroencephalogram (EEG) and magnetoencephalography (MEG) data cause loss of important information. In this study, we addressed this issue using a topological data analysis (TDA) method, called Mapper, which visualizes evolving patterns of brain activity as a mathematical graph. Accordingly, we analyzed preprocessed MEG data of 83 subjects from Human Connectome Project (HCP) collected during working memory n-back task. We examined variation in the dynamics of the brain states with the Mapper graphs, and to determine how this variation relates to measures such as response time and performance. The application of the Mapper method to MEG data detected a novel neuroimaging marker that explained the performance of the participants along with the ground truth of response time. In addition, TDA enabled us to distinguish two task-positive brain activations during 0-back and 2-back tasks, which is hard to detect with the other pipelines that require collapsing the data in the spatial and temporal domain. Further, the Mapper graphs of the individuals also revealed one large group in the middle of the stimulus detecting the high engagement in the brain with fine temporal resolution, which could contribute to increase spatiotemporal resolution by merging different imaging modalities. Hence, our work provides another evidence to the effectiveness of the TDA methods for extracting subtle dynamic properties of high temporal resolution MEG data without the temporal and spatial collapse.
Due to recent climate change-triggered, regular dust storms in the Middle East, dust mitigation has become the critical issue for solar energy harvesting devices. One of the methods to minimize and prevent dust adhesion and create self-cleaning abilities is to generate hydrophobic characteristics on surfaces. The purpose of this study is to explore the topological features of hydrophobic surfaces. We use non-standard techniques from topological data analysis to extract morphological features from the AFM images. Our method recovers most of the previous qualitative observations in a robust and quantitative way. Persistence diagrams, which is a summary of topological structures, witness quantitatively that the crystallized polycarbonate (PC) surface possesses spherulites, voids, and fibrils, and the texture height and spherulite concentration increases with the increased immersion period. The approach also shows that the polydimethylsiloxane (PDMS) exactly copied the structures at the PC surface but 80 to 90 percent of the nanofibrils were not copied at PDMS surface. We next extract a feature vector from each persistence diagram to show which experiments hold features with similar variance using principal component analysis (PCA). The K-means clustering algorithm is applied to the matrix of feature vectors to support the PCA result, grouping experiments with similar features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.