2018
DOI: 10.1186/s12859-018-2495-5
|View full text |Cite
|
Sign up to set email alerts
|

Analyzing the similarity of samples and genes by MG-PCC algorithm, t-SNE-SS and t-SNE-SG maps

Abstract: BackgroundFor analyzing these gene expression data sets under different samples, clustering and visualizing samples and genes are important methods. However, it is difficult to integrate clustering and visualizing techniques when the similarities of samples and genes are defined by PCC(Person correlation coefficient) measure.ResultsHere, for rare samples of gene expression data sets, we use MG-PCC (mini-groups that are defined by PCC) algorithm to divide them into mini-groups, and use t-SNE-SSP maps to display… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(9 citation statements)
references
References 26 publications
0
9
0
Order By: Relevance
“…Hierarchical clustering and heat maps were used to display protein expression patterns 27 and t-SNE (t-statistic Stochastic Neighbor Embedding) maps of standardized samples were used to identify relations between samples. 28 , 29 …”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Hierarchical clustering and heat maps were used to display protein expression patterns 27 and t-SNE (t-statistic Stochastic Neighbor Embedding) maps of standardized samples were used to identify relations between samples. 28 , 29 …”
Section: Methodsmentioning
confidence: 99%
“…The optimal filter threshold was established using the projection score. , Qlucore Omics Explorer version 3.6 Lund Sweden () bioinformatics software was used for analysis. Hierarchical clustering and heat maps were used to display protein expression patterns and t-SNE (t-statistic Stochastic Neighbor Embedding) maps of standardized samples were used to identify relations between samples. , …”
Section: Methodsmentioning
confidence: 99%
“…For the i-th virus of data-s, we use X(i) to represent its complete genome, and F k (i) to represent its F k -feature, where MG-Euclidean and Icc-cluster algorithms MG-Euclidean algorithm does not directly divide viruses into clusters, but put the nearest neighbor viruses to the same mini-groups [15]. That is, when a virus belongs to a mini-group, its nearest neighbors are in the mini-group also.…”
Section: F K -Featuresmentioning
confidence: 99%
“…To search appropriate methods that were able to separate phages from other virus families and classify the different phage families, we constructed 4 data sets to verify our methods, where Data-1 mixed phages and other 9 different virus families, Data-2 contained 6 different phage families, Data-3 owned 9 different virus families that deleted phages of Data-1, Data-4 contained 5 different Ebolavirus families, Data-3 and Data-4 mainly illustrated the complexity of phages. Here, we used t-SNE maps to select efficient features, where t-SNE was able to map nearest neighbor samples onto adjacent points on the plane [15,16]. That is, if t-SNE projections of the different virus families distributed in different regions, the used features were reliable to define the similarity of viruses.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation