Testing theories of hierarchical structure formation requires estimating the distribution of galaxy morphologies and its change with redshift. One aspect of this investigation involves identifying galaxies with disturbed morphologies (e.g., merging galaxies). This is often done by summarizing galaxy images using, e.g., the CAS and Gini-M 20 statistics of Conselice (2003) and Lotz et al. (2004), respectively, and associating particular statistic values with disturbance. We introduce three statistics that enhance detection of disturbed morphologies at high-redshift (z ∼ 2): the multi-mode (M ), intensity (I), and deviation (D) statistics. We show their effectiveness by training a machinelearning classifier, random forest, using 1,639 galaxies observed in the H-band by the Hubble Space Telescope WFC3, galaxies that had been previously classified by eye by the CANDELS collaboration (Grogin et al. 2011, Koekemoer et al. 2011. We find that the M ID statistics (and the A statistic of Conselice 2003) are the most useful for identifying disturbed morphologies.We also explore whether human annotators are useful for identifying disturbed morphologies. We demonstrate that they show limited ability to detect disturbance at high redshift, and that increasing their number beyond ≈10 does not provably yield better classification performance. We propose a simulation-based model-fitting algorithm that mitigates these issues by bypassing annotation.
The development of fast and accurate methods of photometric redshift estimation is a vital step towards being able to fully utilize the data of next‐generation surveys within precision cosmology. In this paper, we apply a specific approach to spectral connectivity analysis (SCA) called diffusion map. SCA is a class of non‐linear techniques for transforming observed data (e.g. photometric colours for each galaxy, where the data lie on a complex subset of p‐dimensional space) to a simpler, more natural coordinate system wherein we apply regression to make redshift predictions. In previous applications of SCA to other astronomical problems, we demonstrate its superiority vis‐a‐vis the principal components analysis, a standard linear technique for transforming data. As SCA relies upon eigen‐decomposition, our training set size is limited to ≲104 galaxies; we use the Nyström extension to quickly estimate diffusion coordinates for objects not in the training set. We apply our method to 350 738 Sloan Digital Sky Survey (SDSS) main sample galaxies, 29 816 SDSS luminous red galaxies and 5223 galaxies from DEEP2 with Canada–France–Hawaii Telescope Legacy Survey ugriz photometry. For all three data sets, we achieve prediction accuracies at par with previous analyses, and find that the use of the Nyström extension leads to a negligible loss of prediction accuracy relative to that achieved with the training sets. As in some previous analyses, we observe that our predictions are generally too high (low) in the low (high) redshift regimes. We demonstrate that this is a manifestation of attenuation bias, wherein measurement error (i.e. uncertainty in diffusion coordinates due to uncertainty in the measured fluxes/magnitudes) reduces the slope of the best‐fitting regression line. Mitigation of this bias is necessary if we are to use photometric redshift estimates produced by computationally efficient empirical methods in precision cosmology.
Photometric redshift estimation is an indispensable tool of precision cosmology. One problem that plagues the use of this tool in the era of large-scale sky surveys is that the bright galaxies that are selected for spectroscopic observation do not have properties that match those of (far more numerous) dimmer galaxies; thus, ill-designed empirical methods that produce accurate and precise redshift estimates for the former generally will not produce good estimates for the latter. In this paper, we provide a principled framework for generating conditional density estimates (i.e. photometric redshift PDFs) that takes into account selection bias and the covariate shift that this bias induces. We base our approach on the assumption that the probability that astronomers label a galaxy (i.e. determine its spectroscopic redshift) depends only on its measured (photometric and perhaps other) properties \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$\boldsymbol {x}$\end{document} and not on its true redshift. With this assumption, we can explicitly write down risk functions that allow us to both tune and compare methods for estimating importance weights (i.e. the ratio of densities of unlabelled and labelled galaxies for different values of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$\boldsymbol {x}$\end{document}) and conditional densities. We also provide a method for combining multiple conditional density estimates for the same galaxy into a single estimate with better properties. We apply our risk functions to an analysis of ≈106 galaxies, mostly observed by Sloan Digital Sky Survey, and demonstrate through multiple diagnostic tests that our method achieves good conditional density estimates for the unlabelled galaxies.
Modern surveys have provided the astronomical community with a flood of highdimensional data, but analyses of these data often occur after their projection to lower-dimensional spaces. In this work, we introduce a local two-sample hypothesis test framework that an analyst may directly apply to data in their native space. In this framework, the analyst defines two classes based on a response variable of interest (e.g. higher-mass galaxies versus lower-mass galaxies) and determines at arbitrary points in predictor space whether the local proportions of objects that belong to the two classes significantly differs from the global proportion.Our framework has a potential myriad of uses throughout astronomy; here, we demonstrate its efficacy by applying it to a sample of 2487 i-band-selected galaxies observed by the HST ACS in four of the CANDELS program fields. For each galaxy, we have seven morphological summary statistics along with an estimated stellar mass and star-formation rate. We perform two studies: one in which we determine regions of the seven-dimensional space of morphological statistics where high-mass galaxies are significantly more numerous than low-mass galaxies, and vice-versa, and another study where we use SFR in place of mass. We find that we are able to identify such regions, and show how high-mass/low-SFR regions are associated with concentrated and undisturbed galaxies while galaxies in low-mass/high-SFR regions appear more extended and/or disturbed than their high-mass/low-SFR counterparts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.