Jenia Jitsev scite author profile

Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of training on large amounts of noisy image-text data, without relying on expensive accurate labels used in standard vision unimodal supervised learning. The resulting models showed capabilities of strong text-guided image generation and transfer to downstream tasks, while performing remarkably at zero-shot classification with noteworthy out-of-distribution robustness. Since then, large-scale language-vision models like ALIGN, BASIC, GLIDE, Flamingo and Imagen made further improvements. Studying the training and capabilities of such models requires datasets containing billions of image-text pairs. Until now, no datasets of this size have been made openly available for the broader research community. To address this problem and democratize research on large-scale multi-modal models, we present LAION-5B -a dataset consisting of 5.85 billion CLIP-filtered image-text pairs, of which 2.32B contain English language. We show successful replication and fine-tuning of foundational models like CLIP, GLIDE and Stable Diffusion using the dataset, and discuss further experiments enabled with an openly available dataset of this scale. Additionally we provide several nearest neighbor indices, an improved web-interface for dataset exploration and subset generation, and detection scores for watermark, NSFW, and toxic content detection. 1 1 Project page: https://laion.ai/laion-5b-a-new-era-of-open-large-scale-multi-modal-datasets/

show abstract

Using physics-informed enhanced super-resolution generative adversarial networks for subfilter modeling in turbulent reactive flows

Bode

Gauding

Lian

et al. 2021

Proceedings of the Combustion Institute

View full text Add to dashboard Cite

Hierarchical information-based clustering for connectivity-based cortex parcellation

Gorbach

Schütte²,

Melzer

et al. 2011

Front. Neuroinform.

View full text Add to dashboard Cite

One of the most promising avenues for compiling connectivity data originates from the notion that individual brain regions maintain individual connectivity profiles; the functional repertoire of a cortical area (“the functional fingerprint”) is closely related to its anatomical connections (“the connectional fingerprint”) and, hence, a segregated cortical area may be characterized by a highly coherent connectivity pattern. Diffusion tractography can be used to identify borders between such cortical areas. Each cortical area is defined based upon a unique probabilistic tractogram and such a tractogram is representative of a group of tractograms, thereby forming the cortical area. The underlying methodology is called connectivity-based cortex parcellation and requires clustering or grouping of similar diffusion tractograms. Despite the relative success of this technique in producing anatomically sensible results, existing clustering techniques in the context of connectivity-based parcellation typically depend on several non-trivial assumptions. In this paper, we embody an unsupervised hierarchical information-based framework to clustering probabilistic tractograms that avoids many drawbacks offered by previous methods. Cortex parcellation of the inferior frontal gyrus together with the precentral gyrus demonstrates a proof of concept of the proposed method: The automatic parcellation reveals cortical subunits consistent with cytoarchitectonic maps and previous studies including connectivity-based parcellation. Further insight into the hierarchically modular architecture of cortical subunits is given by revealing coarser cortical structures that differentiate between primary as well as premotoric areas and those associated with pre-frontal areas.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jenia Jitsev

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

LAION-5B: An open large-scale dataset for training next generation image-text models

Using physics-informed enhanced super-resolution generative adversarial networks for subfilter modeling in turbulent reactive flows

Hierarchical information-based clustering for connectivity-based cortex parcellation

Contact Info

Product

Resources

About