Proceedings of the 25th International Conference on World Wide Web 2016
DOI: 10.1145/2872427.2883041
|View full text |Cite
|
Sign up to set email alerts
|

Visualizing Large-scale and High-dimensional Data

Abstract: We study the problem of visualizing large-scale and highdimensional data in a low-dimensional (typically 2D or 3D) space. Much success has been reported recently by techniques that first compute a similarity structure of the data points and then project them into a low-dimensional space with the structure preserved. These two steps suffer from considerable computational costs, preventing the state-ofthe-art methods such as the t-SNE from scaling to largescale and high-dimensional data (e.g., millions of data p… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
268
0
2

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 318 publications
(271 citation statements)
references
References 21 publications
(37 reference statements)
1
268
0
2
Order By: Relevance
“…Our approach achieves significantly better scalability than t-SNE, which is on the verge of being impractical for datasets with more than a million cells. Although a number of recent studies introduced new techniques for improving the scalability of data visualization tools (Dzwinel and Wcisło, 2015; Tang et al, 2016), they do not address the lack of generalizability that net-SNE overcomes.…”
Section: Discussionmentioning
confidence: 99%
“…Our approach achieves significantly better scalability than t-SNE, which is on the verge of being impractical for datasets with more than a million cells. Although a number of recent studies introduced new techniques for improving the scalability of data visualization tools (Dzwinel and Wcisło, 2015; Tang et al, 2016), they do not address the lack of generalizability that net-SNE overcomes.…”
Section: Discussionmentioning
confidence: 99%
“…Thus, we compared the results between using M versus using M in meaningful downstream visualization applications. Specifically, following previous studies, 17,18 we used t-distributed stochastic neighbor embedding (t-SNE), 19 an algorithm that can efficiently model high-dimensional objects as two-dimensional points, which makes it especially well-suited for visualizing our dataset. We generated our visualization by running t-SNE with default settings on the patient profile matrix M for the baseline and M for VisAGE.…”
Section: Discussionmentioning
confidence: 99%
“…However, it builds a k-NN network directly from the data, and then reduces the network to two dimensions without using external information. 18 Another study built upon LargeVis to visualize single cells, but still also directly computed embeddings from a k-NN network without utilizing external data. 17 Marlin et al visualized a pattern discovery model's clustering parameters in the context of EMR analysis.…”
Section: Related Workmentioning
confidence: 99%
“…To speed up the t-SNE analysis, one could use a multicore version that is available via the Rtsne.multicore package. Alternative algorithms, such as ( Tang et al , 2016) (available via the largeVis package), can be used for dimensionality reduction of very large datasets without downsampling. Alternatively, the dimensionality reduction can be performed on the codes of the SOM, at a resolution specified by the user (see Figure 12).…”
Section: Cell Population Identification With Flowsom and Consensusclumentioning
confidence: 99%