Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence

Raschka, Sebastian; Patterson, Joshua T.; Nolet, Corey

doi:10.48550/arxiv.2002.04803

Cited by 13 publications

(14 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We perform k-Means clustering and learn 1000 cluster centroids (as many as the concepts in each level) on the training sets. We use the cuML k-Means implementation [47], repeat clustering with 3 seeds and compute cluster assignments for test samples using the centroids that gave the lowest inertia across the 3 runs. We report clustering metrics, including cluster purity and adjusted [60] and normalized mutual information scores between the cluster assignments and the labels of the test samples.…”

Section: Ig-1bmentioning

confidence: 99%

Concept Generalization in Visual Representation Learning

Sariyildiz,

Kalantidis,

Larlus

et al. 2020

Preprint

View full text Add to dashboard Cite

Measuring concept generalization, i.e., the extent to which models trained on a set of (seen) visual concepts can be used to recognize a new set of (unseen) concepts, is a popular way of evaluating visual representations, especially when they are learned with self-supervised learning. Nonetheless, the choice of which unseen concepts to use is usually made arbitrarily, and independently from the seen concepts used to train representations, thus ignoring any semantic relationships between the two. In this paper, we argue that semantic relationships between seen and unseen concepts affect generalization performance and propose ImageNet-CoG, a novel benchmark on the Ima-geNet dataset that enables measuring concept generalization in a principled way. Our benchmark leverages expert knowledge that comes from WordNet in order to define a sequence of unseen ImageNet concept sets that are semantically more and more distant from the ImageNet-1K subset, a ubiquitous training set. This allows us to benchmark visual representations learned on ImageNet-1K out-of-the box: we analyse a number of such models from supervised, semi-supervised and self-supervised approaches under the prism of concept generalization, and show how our benchmark is able to uncover a number of interesting insights.

show abstract

Section: Ig-1bmentioning

confidence: 99%

Concept Generalization in Visual Representation Learning

Sariyildiz,

Kalantidis,

Larlus

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…An excellent general overview that digs deeper into the mathematical background than this review is the "high-bias, low variance introduction to Machine Learning" by Mehta et al, 7 recent applications of ML to materials science are covered by Schmidt et al 30 But also many textbooks cover the fundamentals of machine learning; e.g., Tibshirani and Friedman, 31 Shalev-Shwartz and Ben-David 32 as well as Bishop (from a more Bayesian point of view) 33 focus more on the theoretical background of statistical learning, whereas Géron provides a "how-to" for the actual implementation, also of neural network (NN) architectures, using popular Python frameworks, 34 which were recently reviewed by Rascka et al 35…”

Section: Discussionmentioning

confidence: 99%

Big-Data Science in Porous Materials: Materials Genomics and Machine Learning

Jablonka,

Ongari,

Moosavi

et al. 2020

Preprint

View full text Add to dashboard Cite

By combining metal nodes with organic linkers we can potentially synthesize millions of possible metal-organic frameworks (MOFs). The fact that we have so many materials opens many exciting avenues, but also create new challenges. We simply have too many material to be processed using conventional, brute force, methods. In this review, we show that having so many materials allows us to use big-data methods as a powerful technique to study these materials and to discover complex correlations. The first part of the review gives an introduction to the principles of big-data science. We show how to select appropriate training sets, survey approaches that are used to represent these materials in feature space, review different learning architectures, as well as evaluation and interpretation strategies. In the second part, we review how the different approaches of machine learning have been applied to porous materials. In particular, we discuss applications in the field of gas storage and separation, the stability of these materials, their electronic properties, and their synthesis. Given the increasing interest of the scientific community in machine learning, we expect this list to rapidly expand in the coming years.

show abstract

“…To accelerate the experiments, we implement the entire evaluation pipeline on GPU utilizing the RAPIDS GPU data science framework. Data loading and preprocessing are boosted by cuDF [21] while scikit-learn models and scorers are replaced with their GPU counterparts in cuML library [11].…”

Section: Gpu Accelerated Exhaustive Searchmentioning

confidence: 99%

“…• A suite of GPU-optimized cuML [11] models including scikit-learn counterparts, MLPs and Xgboost [12] are added to the Bayesmark toolkit to accelerate single model evaluation.…”

Section: Introductionmentioning

confidence: 99%

GPU Accelerated Exhaustive Search for Optimal Ensemble of Black-Box Optimization Algorithms

Liu¹,

Tunguz²,

Titericz³

2020

Preprint

View full text Add to dashboard Cite

Black-box optimization is essential for tuning complex machine learning algorithms which are easier to experiment with than to understand. In this paper, we show that a simple ensemble of black-box optimization algorithms can outperform any single one of them. However, searching for such an optimal ensemble requires a large number of experiments. We propose a Multi-GPU-optimized framework to accelerate a brute force search for the optimal ensemble of black-box optimization algorithms by running many experiments in parallel. The lightweight optimizations are performed by CPU while expensive model training and evaluations are assigned to GPUs. We evaluate 15 optimizers by training 2.7 million models and running 541,440 optimizations. On a DGX-1, the search time is reduced from more than 10 days on two 20-core CPUs to less than 24 hours on 8-GPUs. With the optimal ensemble found by GPU-accelerated exhaustive search, we won the 2 nd place of NeurIPS 2020 black-box optimization challenge 1 .

show abstract

Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence

Cited by 13 publications

References 0 publications

Concept Generalization in Visual Representation Learning

Concept Generalization in Visual Representation Learning

Big-Data Science in Porous Materials: Materials Genomics and Machine Learning

GPU Accelerated Exhaustive Search for Optimal Ensemble of Black-Box Optimization Algorithms

Contact Info

Product

Resources

About