Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of training on large amounts of noisy image-text data, without relying on expensive accurate labels used in standard vision unimodal supervised learning. The resulting models showed capabilities of strong text-guided image generation and transfer to downstream tasks, while performing remarkably at zero-shot classification with noteworthy out-of-distribution robustness. Since then, large-scale language-vision models like ALIGN, BASIC, GLIDE, Flamingo and Imagen made further improvements. Studying the training and capabilities of such models requires datasets containing billions of image-text pairs. Until now, no datasets of this size have been made openly available for the broader research community. To address this problem and democratize research on large-scale multi-modal models, we present LAION-5B -a dataset consisting of 5.85 billion CLIP-filtered image-text pairs, of which 2.32B contain English language. We show successful replication and fine-tuning of foundational models like CLIP, GLIDE and Stable Diffusion using the dataset, and discuss further experiments enabled with an openly available dataset of this scale. Additionally we provide several nearest neighbor indices, an improved web-interface for dataset exploration and subset generation, and detection scores for watermark, NSFW, and toxic content detection. 1 1 Project page: https://laion.ai/laion-5b-a-new-era-of-open-large-scale-multi-modal-datasets/
Constructive machine learning aims at finding one or more instances of a domain which will exhibit some desired properties. Such a process bears a strong similarity with a design process where the ultimate objective is the generation of previously unknown and novel objects by using knowledge about known objects. The aim of the present work is to bring ideas from design theory to machine learning and elaborate an experimental procedure allowing the study of design through machine learning approaches. To this end, we propose an actionable definition of creativity as the generation of out-of-distribution novelty. We assess several metrics designed for evaluating the quality of generative models on this new task. Through extensive experiments on various types of generative models, we find architectures and hyperparameter combinations which lead to out-of-distribution novelty. Such generators can then be used to search a semantically richer and broader space than standard generative models would allow. * This paper is an adapted version of our submission to ICLR17, available here.30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.