2017
DOI: 10.1016/j.bdq.2017.07.001
|View full text |Cite
|
Sign up to set email alerts
|

*K-means and cluster models for cancer signatures

Abstract: We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in https://ssrn.com/abstract=2802753 to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means’ computational cost is a fraction of NMF’s. Using 1389 published samples for 14 cancer types, we find that 3 cancers (liver cancer, l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 31 publications
(7 citation statements)
references
References 71 publications
0
6
0
Order By: Relevance
“…Molecular subtyping is considered as a favorable source of disease stratification ( Guinney et al, 2015 ; Sjodahl et al, 2017 ), while similar lncRNA-based subtyping is still lacking. With a k-means clustering analysis ( Kakushadze and Yu, 2017 ), we could divide 111 ESCC patients from TCGA into two molecular subgroups based on 50 lncRNA markers, and its performance was again confirmed in internal and external datasets. The lncRNA-based subtyping showed good clustering capability and could be treated as a potential tool for ESCC molecular subtyping.…”
Section: Discussionmentioning
confidence: 89%
“…Molecular subtyping is considered as a favorable source of disease stratification ( Guinney et al, 2015 ; Sjodahl et al, 2017 ), while similar lncRNA-based subtyping is still lacking. With a k-means clustering analysis ( Kakushadze and Yu, 2017 ), we could divide 111 ESCC patients from TCGA into two molecular subgroups based on 50 lncRNA markers, and its performance was again confirmed in internal and external datasets. The lncRNA-based subtyping showed good clustering capability and could be treated as a potential tool for ESCC molecular subtyping.…”
Section: Discussionmentioning
confidence: 89%
“…Some of these applications are already being documented in the literature, as was done by Salma, (2016) that used a variation of K-means (fast K-means) to select the most relevant resources from a high-dimension breast cancer data set, reaching an accuracy 99.39%. In this context, it is also worth mentioning the study of Kakushadze and Yu, (2017), in which they used 1389 published samples of 14 types of cancer and found that 3 types of cancer (liver cancer, lung cancer and renal cell carcinoma) stand out from the others and had no similar structures to the cluster. In our study, using this same algorithm, we identified that two groups have the best explanatory capacity for our data, dividing them between patients and controls with a hit rate that reached 72.81% when analyzed using the t-sne.…”
Section: Discussionmentioning
confidence: 99%
“…The increasing availability of cell line data has allowed for widespread drug sensitivity prediction based on genetic profiles, and genomic data have played a key part in this effort. Pan cancer analysis and more specific interactions, such as the response to leucovorin, fluorouracil, and oxaliplatin in patients with colorectal cancer, have both made use of genomic information to predict clinical response measures [21,37,38].…”
Section: Predicting and Evaluating Treatment Responsementioning
confidence: 99%