Automatic Selection of t-SNE Perplexity

Cao, Yanshuai; Wang, Luyu

doi:10.48550/arxiv.1708.03229

Cited by 14 publications

(17 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For an illustration of cluster separation, we plot the t-distributed stochastic neighbor embedding (t-SNE) of a random sample 11 of 4400 devices in Figure 12 with perplexity chosen as in [21]. The clusters appear to be well separated with only minimal overlap, especially the single-company dominated cluster 1, thus reinforcing the clustering results.…”

Section: Temporal Traffic Spectrum and Clustering Analysismentioning

confidence: 56%

How does enterprise IoT traffic evolve? Real-world evidence from a Finnish operator

Finley,

Benseny,

Vesselkov

et al. 2020

Preprint

View full text Add to dashboard Cite

The adoption of Internet of Things (IoT) technologies in businesses is increasing and thus enterprise IoT (EIoT) is seemingly shifting from hype to reality. However, the actual use of EIoT over significant timescales has not been empirically analyzed. In other words, the reality remains unexplored. Furthermore, despite the variety of EIoT verticals, the use of IoT across vertical industries has not been compared. This paper uses a two-year EIoT dataset from a major Finnish mobile network operator to investigate device use across industries, cellular traffic patterns, and mobility patterns. We present a variety of novel findings: EIoT traffic volume per device has increased three-fold over the last two years, the share of LTE-enabled devices has remained low at around 2% and that 30% of EIoT devices are still 2G only, and there are order of magnitude differences between different industries' EIoT traffic and mobility. We also show that daily traffic can be clustered into only three patterns, differing mainly in the presence and timing of a peak hour. Beyond these descriptive results, modeling and forecasting is conducted for both traffic and mobility. We forecast the total daily EIoT traffic through a temporal regression model and achieve an error of about 15% over medium-term (30 to 180 day) horizons. We also model device mobility through a Markov mixture model and quantify the upper bound of predictability for device mobility.

show abstract

Section: Temporal Traffic Spectrum and Clustering Analysismentioning

confidence: 56%

How does enterprise IoT traffic evolve? Real-world evidence from a Finnish operator

Finley,

Benseny,

Vesselkov

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…• Scikit-learn package defaults for v0. 24.1, which are perplexity = 30, no exaggeration, and learning rate = 200 (note that we actually set it to 800 because we generate all embeddings using the OpenTSNE package, and in OpenTSNE the learning rate definition is 4 times smaller than in scikit-learn).…”

Section: Discussionmentioning

confidence: 99%

“…Because t-SNE has proven so popular, many researchers and software library authors have worked to identify guidelines for using t-SNE and selecting its hyperparameters [1,5,4,7,24,2,6]. Initially, researchers thought that t-SNE was robust to hyperparameter values, in particular for perplexity [1], but gradually research showed this is not completely true [4,6,5].…”

Section: Identifying Good T-sne Hyperparametersmentioning

confidence: 99%

“…Other work has attempted to automatically derive optimal hyperparameters, which is similar to our goal. Cao and Wang [24] propose optimizing perplexity by running t-SNE on a data set multiple times and choosing the perplexity value that minimizes their objective function, which incorporates KL divergence. This requires computing multiple embeddings, which is time consuming and slows down the analysis process.…”

Section: Identifying Good T-sne Hyperparametersmentioning

confidence: 99%

“…To do this, it is necessary to describe the data set with summary statistics and derived features. We are not aware of prior art that attempts to featurize entire data sets for purposes such as ours, except for the cited work that used the number of dimensions or the number of data points [24,5]. Therefore we propose a new method for featurizing data sets with a set of statistics intended to provide more nuance than simply the number of dimensions and the number of points.…”

Section: Extracting Features From Data Setsmentioning

confidence: 99%

See 2 more Smart Citations

New Guidance for Using t-SNE: Alternative Defaults, Hyperparameter Selection Automation, and Comparative Evaluation

Gove¹,

Cadalzo²,

Leiby³

et al. 2022

Preprint

View full text Add to dashboard Cite

We present new guidelines for choosing hyperparameters for t-SNE and an evaluation comparing these guidelines to current ones. These guidelines include a proposed empirically optimum guideline derived from a t-SNE hyperparameter grid search over a large collection of data sets. We also introduce a new method to featurize data sets using graph-based metrics called scagnostics; we use these features to train a neural network that predicts optimal t-SNE hyperparameters for the respective data set. This neural network has the potential to simplify the use of t-SNE by removing guesswork about which hyperparameters will produce the best embedding. We evaluate and compare our neural network-derived and empirically optimum hyperparameters to several other t-SNE hyperparameter guidelines from the literature on 68 data sets. The hyperparameters predicted by our neural network yield embeddings with similar accuracy as the best current t-SNE guidelines. Using our empirically optimum hyperparameters is simpler than following previously published guidelines but yields more accurate embeddings, in some cases by a statistically significant margin. We find that the useful ranges for t-SNE hyperparameters are narrower and include smaller values than previously reported in the literature. Importantly, we also quantify the potential for future improvements in this area: using data from a grid search of t-SNE hyperparameters we find that an optimal selection method could improve embedding accuracy by up to two percentage points over the methods examined in this paper.

show abstract

Deep learning at scale for the construction of galaxy catalogs in the Dark Energy Survey

et al. 2019

View full text Add to dashboard Cite

The scale of ongoing and future electromagnetic surveys pose formidable challenges to classify astronomical objects. Pioneering efforts on this front include citizen science campaigns adopted by the Sloan Digital Sky Survey (SDSS). SDSS datasets have been recently used to train neural network models to classify galaxies in the Dark Energy Survey (DES) that overlap the footprint of both surveys. Herein, we demonstrate that knowledge from deep learning algorithms, pre-trained with real-object images, can be transferred to classify galaxies that overlap both SDSS and DES surveys, achieving state-of-the-art accuracy ∼ > 99.6%. We demonstrate that this process can be completed within just eight minutes using distributed training. While this represents a significant step towards the classification of DES galaxies that overlap previous surveys, we need to initiate the characterization of unlabelled DES galaxies in new regions of parameter space. To accelerate this program, we use our neural network classifier to label over ten thousand unlabelled DES galaxies, which do not overlap previous surveys. Furthermore, we use our neural network model as a feature extractor for unsupervised clustering and find that unlabeled DES images can be grouped together in two distinct galaxy classes based on their morphology, which provides a heuristic check that the learning is successfully transferred to the classification of unlabelled DES images. We conclude by showing that these newly labeled datasets can be combined with unsupervised recursive training to create large-scale DES galaxy catalogs in preparation for the Large Synoptic Survey Telescope era.

show abstract

Automatic Selection of t-SNE Perplexity

Cited by 14 publications

References 3 publications

How does enterprise IoT traffic evolve? Real-world evidence from a Finnish operator

How does enterprise IoT traffic evolve? Real-world evidence from a Finnish operator

New Guidance for Using t-SNE: Alternative Defaults, Hyperparameter Selection Automation, and Comparative Evaluation

Deep learning at scale for the construction of galaxy catalogs in the Dark Energy Survey

Contact Info

Product

Resources

About