Misa Goudo scite author profile

In processing metabolomics data, multidimensional quantitative data from thousands of metabolites are often sparse, that is, only a small fraction of metabolites are relevant to the phenotype of interest. Clustering is therefore used to discover subtypes from omics data. Sparse processing, which selects important metabolites from the total omics data, is an effective clustering technique. This study investigated the effectiveness of sparse k-means for metabolomics data. Specifically, sparse k-means was used to cluster blood lipid metabolite data of breast cancer patients in two studies: (1) before and after menopause, and (2) pre- and postoperative chemotherapy. In both cases, sparse k-means showed comparable discrimination accuracy with fewer metabolites than k-means. Furthermore, when the L1 norm values were varied, no significant changes were observed. The mean silhouette coefficients of sparse k-means and k-means were (1) 0.38 pm 0.14 (S.D.) and 0.17 pm 0.01, (2) 0.38 pm 0.07 and 0.17 pm 0.01, indicating that feature selection using sparse k-means can improve clustering results. In addition, metabolite selection using sparse k-means was consistent regardless of the test data or the constrained value of the L1 norm, indicating robustness.

show abstract

Comparison of classification accuracy and feature selection between sparse and non-sparse modeling of metabolomics data

Toda

Goudo

Sugimoto

et al. 2023

Preprint

View full text Add to dashboard Cite

Machine learnings such as multivariate analyses and clustering have been frequently used for metabolomics data analyses. In metabolomics data analyses, how much difference there is between the results calculated by supervised and unsupervised learning models is an interesting topic. Since metabolomics data include hundreds to thousands of metabolites greater than the sample numbers, only a small fraction of metabolites is relevant to the phenotype of interest. For this reason, sparse mechanisms have been introduced into many machine learning models. However, its explanatory power decreases when the number of explanatory variables is reduced to an extreme level. In this paper, serum lipidomic data of breast cancer patients (1) pre/post-menopause and (2) before/after neoadjuvant chemotherapy was chosen as one of metabolomics data. Here, this data was analyzed by partial least squares (PLS) for regression and K-means and hierarchical clustering for clustering. Results were also compare with the sparse modeling. Between the non-sparse and sparse modeling accuracy, there is no significant difference. Metabolite subsets selected by sparse modeling were almost identical to the PLS-selected features. At the same time, several metabolites were consistently selected regardless of the algorithm used. These results contribute to exploring biomarkers in high-dimensional metabolomics datasets.

show abstract

Demarcation Line Determination for Diagnosis of Gastric Cancer Disease Range Using Unsupervised Machine Learning in Magnifying Narrow-Band Imaging

et al. 2022

View full text Add to dashboard Cite

Background and Aims: It is important to determine an accurate demarcation line (DL) between the cancerous lesions and background mucosa in magnifying narrow-band imaging (M-NBI)-based diagnosis. However, it is difficult for novice endoscopists. We aimed to automatically determine the accurate DL using a machine learning method. Methods: We used an unsupervised machine learning approach to determine the DLs. Our method consists of the following four steps: (1) an M-NBI image is segmented into superpixels using simple linear iterative clustering; (2) the image features are extracted for each superpixel; (3) the superpixels are grouped into several clusters using the k-means method; and (4) the boundaries of the clusters are extracted as DL candidates. The 23 M-NBI images of 11 cases were used for performance evaluation. The evaluation investigated the similarity of the DLs identified by endoscopists and our method, and the Euclidean distance between the two DLs was calculated. For the single case of 11 cases, the histopathological examination was also conducted to evaluate the proposed system. Results: The average Euclidean distances for the 11 cases were 10.65, 11.97, 7.82, 8.46, 8.59, 9.72, 12.20, 9.06, 22.86, 8.45, and 25.36. The results indicated that the proposed method could identify similar DLs to those identified by experienced doctors. Additionally, it was confirmed that the proposed system could generate pathologically valid DLs by increasing the number of clusters. Conclusions: Our proposed system can support the training of inexperienced doctors as well as enrich the knowledge of experienced doctors in endoscopy.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Misa Goudo

The usefulness of sparse k-means in metabolomics data: An example from breast cancer data

Comparison of classification accuracy and feature selection between sparse and non-sparse modeling of metabolomics data

Demarcation Line Determination for Diagnosis of Gastric Cancer Disease Range Using Unsupervised Machine Learning in Magnifying Narrow-Band Imaging

Contact Info

Product

Resources

About