Abstract:Understanding the star-formation properties of galaxies as a function of cosmic epoch is a critical exercise in studies of galaxy evolution. Traditionally, stellar population synthesis models have been used to obtain best fit parameters that characterise star formation in galaxies. As multiband flux measurements become available for thousands of galaxies, an alternative approach to characterising star formation using machine learning becomes feasible. In this work, we present the use of deep learning technique… Show more
“…A supervised Neural Network is trained for regression between photometric values and stellar population properties. For example, Surana et al (2020) used fully connected Artificial Neural Networks applied to data from the GAMA survey to derive stellar masses, star formation rates and dust properties of galaxies.…”
Section: Stellar Populations Star Formation Historiesmentioning
The amount and complexity of data delivered by modern galaxy surveys has been steadily increasing over the past years. New facilities will soon provide imaging and spectra of hundreds of millions of galaxies. Extracting coherent scientific information from these large and multi-modal data sets remains an open issue for the community and data-driven approaches such as deep learning have rapidly emerged as a potentially powerful solution to some long lasting challenges. This enthusiasm is reflected in an unprecedented exponential growth of publications using neural networks, which have gone from a handful of works in 2015 to an average of one paper per week in 2021 in the area of galaxy surveys. Half a decade after the first published work in astronomy mentioning deep learning, and shortly before new big data sets such as Euclid and LSST start becoming available, we believe it is timely to review what has been the real impact of this new technology in the field and its potential to solve key challenges raised by the size and complexity of the new datasets. The purpose of this review is thus two-fold. We first aim at summarising, in a common document, the main applications of deep learning for galaxy surveys that have emerged so far. We then extract the major achievements and lessons learned and highlight key open questions and limitations, which in our opinion, will require particular attention in the coming years. Overall, state-of-the-art deep learning methods are rapidly adopted by the astronomical community, reflecting a democratisation of these methods. This review shows that the majority of works using deep learning up to date are oriented to computer vision tasks (e.g. classification, segmentation). This is also the domain of application where deep learning has brought the most important breakthroughs so far. However, we also report that the applications are becoming more diverse and deep learning is used for estimating galaxy properties, identifying outliers or constraining the cosmological model. Most of these works remain at the exploratory level though which could partially explain the limited impact in terms of citations. Some common challenges will most likely need to be addressed before moving to the next phase of massive deployment of deep learning in the processing of future surveys; for example, uncertainty quantification, interpretability, data labelling and domain shift issues from training with simulations, which constitutes a common practice in astronomy.
“…A supervised Neural Network is trained for regression between photometric values and stellar population properties. For example, Surana et al (2020) used fully connected Artificial Neural Networks applied to data from the GAMA survey to derive stellar masses, star formation rates and dust properties of galaxies.…”
Section: Stellar Populations Star Formation Historiesmentioning
The amount and complexity of data delivered by modern galaxy surveys has been steadily increasing over the past years. New facilities will soon provide imaging and spectra of hundreds of millions of galaxies. Extracting coherent scientific information from these large and multi-modal data sets remains an open issue for the community and data-driven approaches such as deep learning have rapidly emerged as a potentially powerful solution to some long lasting challenges. This enthusiasm is reflected in an unprecedented exponential growth of publications using neural networks, which have gone from a handful of works in 2015 to an average of one paper per week in 2021 in the area of galaxy surveys. Half a decade after the first published work in astronomy mentioning deep learning, and shortly before new big data sets such as Euclid and LSST start becoming available, we believe it is timely to review what has been the real impact of this new technology in the field and its potential to solve key challenges raised by the size and complexity of the new datasets. The purpose of this review is thus two-fold. We first aim at summarising, in a common document, the main applications of deep learning for galaxy surveys that have emerged so far. We then extract the major achievements and lessons learned and highlight key open questions and limitations, which in our opinion, will require particular attention in the coming years. Overall, state-of-the-art deep learning methods are rapidly adopted by the astronomical community, reflecting a democratisation of these methods. This review shows that the majority of works using deep learning up to date are oriented to computer vision tasks (e.g. classification, segmentation). This is also the domain of application where deep learning has brought the most important breakthroughs so far. However, we also report that the applications are becoming more diverse and deep learning is used for estimating galaxy properties, identifying outliers or constraining the cosmological model. Most of these works remain at the exploratory level though which could partially explain the limited impact in terms of citations. Some common challenges will most likely need to be addressed before moving to the next phase of massive deployment of deep learning in the processing of future surveys; for example, uncertainty quantification, interpretability, data labelling and domain shift issues from training with simulations, which constitutes a common practice in astronomy.
“…Stensbo-Smidt et al (2017 estimated specific star formation rates (sSFRs) and redshifts using broad-band photometry from SDSS (Sloan Digital Sky Survey, Eisenstein et al (2011)). Surana et al (2020) used CNNs with multiband flux measurements from the GAMA (Galaxy and Mass Assembly, Driver et al (2009)) survey to predict galaxy stellar mass, star formation rate, and dust luminosity. Simet et al (2019) used neural networks trained on semi-analytic catalogs tuned to the CANDELS (Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey, Grogin et al (2011)) survey to predict stellar mass, metallicity, and average star formation rate.…”
Section: Machine Learning and Sed Fittingmentioning
Traditional spectral energy distribution (SED) fitting codes used to derive galaxy physical properties are often uncertain at the factor of a few level owing to uncertainties in galaxy star formation histories and dust attenuation curves. Beyond this, Bayesian fitting (which is typically used in SED fitting software) is an intrinsically compute-intensive task, often requiring access to expensive hardware for long periods of time. To overcome these shortcomings, we have developed mirkwood: a user-friendly tool comprising of an ensemble of supervised machine learning-based models capable of non-linearly mapping galaxy fluxes to their properties. By stacking multiple models, we marginalize against any individual model's poor performance in a given region of the parameter space. We demonstrate mirkwood's significantly improved performance over traditional techniques by training it on a combined data set of mock photometry of z=0 galaxies from the Simba, EAGLE and IllustrisTNG cosmological simulations, and comparing the derived results with those obtained from traditional SED fitting techniques. mirkwood is also able to account for uncertainties arising both from intrinsic noise in observations, and from finite training data and incorrect modeling assumptions. To increase the added value to the observational community, we use Shapley value explanations (SHAP) to fairly evaluate the relative importance of different bands to understand why particular predictions were reached. We envisage mirkwood to be an evolving, open-source framework that will provide highly accurate physical properties from observations of galaxies as compared to traditional SED fitting.
“…Another way to estimate the SFR is to rely on machine learning techniques (ML). For example, a galaxy catalog can be processed through a neural network previously trained with a sample of objects whose physical properties are already known (Davidzon et al 2019;Surana et al 2020;Gilda et al 2021;Simet et al 2021). This means that the targets can be compared to other observed galaxies, instead of synthetic templates, with the advantage of adhering more coherently to the observational parameter space.…”
We present a novel method for estimating galaxy physical properties from spectral energy distributions (SEDs) as an alternative to template fitting techniques and based on self-organizing maps (SOMs) to learn the high-dimensional manifold of a photometric galaxy catalog. The method has previously been tested with hydrodynamical simulations in Davidzon et al. (2019, MNRAS, 489, 4817), however, here it is applied to real data for the first time. It is crucial for its implementation to build the SOM with a high-quality panchromatic data set, thus we selected “COSMOS2020” galaxy catalog for this purpose. After the training and calibration steps with COSMOS2020, other galaxies can be processed through SOMs to obtain an estimate of their stellar mass and star formation rate (SFR). Both quantities resulted in a good agreement with independent measurements derived from more extended photometric baseline and, in addition, their combination (i.e., the SFR vs. stellar mass diagram) shows a main sequence of star-forming galaxies that is consistent with the findings of previous studies. We discuss the advantages of this method compared to traditional SED fitting, highlighting the impact of replacing the usual synthetic templates with a collection of empirical SEDs built by the SOM in a “data-driven” way. Such an approach also allows, even for extremely large data sets, for an efficient visual inspection to identify photometric errors or peculiar galaxy types. While also considering the computational speed of this new estimator, we argue that it will play a valuable role in the analysis of oncoming large-area surveys such as Euclid of the Legacy Survey of Space and Time at the Vera C. Rubin Telescope.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.