A Clustering Algorithm for Multi-Modal Heterogeneous Big Data With Abnormal Data

Yan, An; Wang, Wei; Ren, Yi; Geng, Hongwei

doi:10.3389/fnbot.2021.680613

Cited by 12 publications

(6 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, other soft-clustering methods for multi-modal data are reported ( Yan et al , 2021 ; Zhang et al , 2022 ). While their utility or effectiveness with multi-modal biomedical data remains unknown, they may provide an additional framework to the analysis of the multi-modal disease-omics data studied in this article.…”

Section: Discussionmentioning

confidence: 99%

Latent disease similarities and therapeutic repurposing possibilities uncovered by multi-modal generative topic modeling of human diseases

Kozawa

Yokoyama

Urayama

et al. 2023

Bioinformatics Advances

View full text Add to dashboard Cite

Motivation Human diseases are characterized by multiple features such as their pathophysiological, molecular, and genetic changes. The rapid expansion of such multi-modal disease-omics space provides an opportunity to re-classify diverse human diseases and to uncover their latent molecular similarities, which could be exploited to repurpose a therapeutic-target for one disease to another. Results Herein, we probe this underexplored space by soft-clustering 6,955 human diseases by multi-modal generative topic modeling. Focusing on chronic kidney disease and myocardial infarction, two most life-threatening diseases, unveiled are their previously underrecognized molecular similarities to neoplasia and mental/neurological-disorders, and 69 repurposable therapeutic-targets for these diseases. Using an edit-distance based pathway-classifier, we also find molecular pathways by which these targets could elicit their clinical effects. Importantly, for the 17 targets, the evidence for their therapeutic usefulness is retrospectively found in the pre-clinical and clinical space, illustrating the effectiveness of the method, and suggesting its broader applications across diverse human diseases. Availability The code reported in this paper is available at: https://github.com/skozawa170301ktx/MultiModalDiseaseModeling Supplementary information Supplementary data are available at Bioinformatics Advances online.

show abstract

Section: Discussionmentioning

confidence: 99%

Latent disease similarities and therapeutic repurposing possibilities uncovered by multi-modal generative topic modeling of human diseases

Kozawa

Yokoyama

Urayama

et al. 2023

Bioinformatics Advances

View full text Add to dashboard Cite

show abstract

“…developed. For instance, new techniques of clustering analysis are being developed to deal with noise, outliers, and data sets with missing values (Song et al 2021;Yan et al 2021). These are the same limitations that cripple our efforts of chemical tagging.…”

Section: Discussionmentioning

confidence: 99%

The Gaia-ESO Survey: Chemical tagging in the thin disk

Spina

Magrini

Sacco

et al. 2022

A&A

View full text Add to dashboard Cite

Context. The chemical makeup of a star provides the fossil information of the environment where it formed. Under this premise, it should be possible to use chemical abundances to tag stars that formed within the same stellar association. This idea -known as chemical tagging -has not produced the expected results, especially within the thin disk where open stellar clusters have chemical patterns that are difficult to disentangle. Aims. The ultimate goal of this study is to probe the feasibility of chemical tagging within the thin disk population using high-quality data from a controlled sample of stars. We also aim at improving the existing techniques of chemical tagging and giving some kind of guidance on different strategies of clustering analysis in the elemental abundance space. Methods. Here we develop the first blind search of open clusters' members through clustering analysis in the elemental abundance space using the OPTICS algorithm applied to data from the Gaia-ESO survey. First, we evaluate different strategies of analysis (e.g., choice of the algorithm, data preprocessing techniques, metric, space of data clustering), determining which ones are more performing. Second, we apply these methods to a data set including both field stars and open clusters attempting a blind recover of as many open clusters as possible. Results. We show how specific strategies of data analysis can improve the final results. Specifically, we demonstrate that open clusters can be more efficaciously recovered with the Manhattan metric and on a space whose dimensions are carefully selected. Using these (and other) prescriptions we are able to recover open clusters hidden in our data set and find new members of these stellar associations (i.e., escapers, binaries). Conclusions. Our results indicate that there are chances of recovering open clusters' members via clustering analysis in the elemental abundance space, albeit in a data set that has a very high fraction of cluster members compared to an average field star sample. Presumably, the performances of chemical tagging will further increase with higher quality data and more sophisticated clustering algorithms, which will likely became available in the near future.

show abstract

“…Given a consensus partition matrix G * ∈ R K×N + and a set of local partitions G = {G (1) , G (2) , • • • G V }, we define the category utility function between G * and each G v with 1 ≤ v ≤ V as follows:…”

Section: Category Utility Functionmentioning

confidence: 99%

“…generated by social networks users is changing rapidly. As data collections become highly diversified [1] due to the emergence of multi-modal data sets, multi-view data sets (i.e. the same data sample described in various ways) and dispersed data, it is now critical to effectively extract inherent information from these multi-source data sets.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-modal Multi-view Clustering based on Non-negative Matrix Factorization

Khalafaoui

Grozavu

Mateï

et al. 2022

2022 IEEE Symposium Series on Computational Intelligence (SSCI)

View full text Add to dashboard Cite

By combining related objects, unsupervised machine learning techniques aim to reveal the underlying patterns in a data set.Non-negative Matrix Factorization (NMF) is a data mining technique that splits data matrices by imposing restrictions on the elements' non-negativity into two matrices: one representing the data partitions and the other to represent the cluster prototypes of the data set. This method has attracted a lot of attention and is used in a wide range of applications, including text mining, clustering, language modeling, music transcription, and neuroscience (gene separation). The interpretation of the generated matrices is made simpler by the absence of negative values. In this article, we propose a study on multi-modal clustering algorithms and present a novel method called multi-modal multi-view non-negative matrix factorization, in which we analyze the collaboration of several local NMF models. The experimental results show the value of the proposed approach, which was evaluated using a variety of data sets, and the obtained results are very promising compared to state of art methods.

show abstract

A Clustering Algorithm for Multi-Modal Heterogeneous Big Data With Abnormal Data

Cited by 12 publications

References 24 publications

Latent disease similarities and therapeutic repurposing possibilities uncovered by multi-modal generative topic modeling of human diseases

Latent disease similarities and therapeutic repurposing possibilities uncovered by multi-modal generative topic modeling of human diseases

The Gaia-ESO Survey: Chemical tagging in the thin disk

Multi-modal Multi-view Clustering based on Non-negative Matrix Factorization

Contact Info

Product

Resources

About