We present the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project. CAMELS is a suite of 4233 cosmological simulations of 25 h − 1 Mpc 3 volume each: 2184 state-of-the-art (magneto)hydrodynamic simulations run with the AREPO and GIZMO codes, employing the same baryonic subgrid physics as the IllustrisTNG and SIMBA simulations, and 2049 N-body simulations. The goal of the CAMELS project is to provide theory predictions for different observables as a function of cosmology and astrophysics, and it is the largest suite of cosmological (magneto)hydrodynamic simulations designed to train machine-learning algorithms. CAMELS contains thousands of different cosmological and astrophysical models by way of varying Ω m , σ 8, and four parameters controlling stellar and active galactic nucleus feedback, following the evolution of more than 100 billion particles and fluid elements over a combined volume of ( 400 h − 1 Mpc ) 3 . We describe the simulations in detail and characterize the large range of conditions represented in terms of the matter power spectrum, cosmic star formation rate density, galaxy stellar mass function, halo baryon fractions, and several galaxy scaling relations. We show that the IllustrisTNG and SIMBA suites produce roughly similar distributions of galaxy properties over the full parameter space but significantly different halo baryon fractions and baryonic effects on the matter power spectrum. This emphasizes the need for marginalizing over baryonic effects to extract the maximum amount of information from cosmological surveys. We illustrate the unique potential of CAMELS using several machine-learning applications, including nonlinear interpolation, parameter estimation, symbolic regression, data generation with Generative Adversarial Networks, dimensionality reduction, and anomaly detection.
We present the Cosmology and Astrophysics with Machine Learning Simulations (CAMELS) Multifield Data set (CMD), a collection of hundreds of thousands of 2D maps and 3D grids containing many different properties of cosmic gas, dark matter, and stars from more than 2000 distinct simulated universes at several cosmic times. The 2D maps and 3D grids represent cosmic regions that span ∼100 million light-years and have been generated from thousands of state-of-the-art hydrodynamic and gravity-only N-body simulations from the CAMELS project. Designed to train machine-learning models, CMD is the largest data set of its kind containing more than 70 TB of data. In this paper we describe CMD in detail and outline a few of its applications. We focus our attention on one such task, parameter inference, formulating the problems we face as a challenge to the community. We release all data and provide further technical details at https://camels-multifield-dataset.readthedocs.io.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.