The Galaxy Zoo 1 catalog displays a bias towards the S-wise winding direction in spiral galaxies which has yet to be explained. The lack of an explanation confounds our attempts to verify the Cosmological Principle, and has spurred some debate as to whether a bias exists in the real universe. The bias manifests not only in the obvious case of trying to decide if the universe as a whole has a winding bias, but also in the more insidious case of selecting which galaxies to include in a winding direction survey. While the former bias has been accounted for in a previous image-mirroring study, the latter has not. Furthermore, the bias has never been corrected in the GZ1 catalog, as only a small sample of the GZ1 catalog was re-examined during the mirror study. We show that the existing bias is a human selection effect rather than a human chirality bias. In effect, the excess S-wise votes are spuriously "stolen" from the elliptical and edge-on-disk categories, not the Z-wise category. Thus, when selecting a set of spiral galaxies by imposing a threshold T so that max(P S , P Z ) > T or P S +P Z > T , we spuriously select more S-wise than Z-wise galaxies. We show that when a provably unbiased machine selects which galaxies are spirals independent of their chirality, the Swise surplus vanishes, even if humans are still used to determine the chirality. Thus, when viewed across the entire GZ1 sample (and by implication, the Sloan catalog), the winding direction of arms in spiral galaxies as viewed from Earth is consistent with the flip of a fair coin.
Detecting PE malware files is now commonly approached using statistical and machine learning models. While these models commonly use features extracted from the structure of PE files, we propose that icons from these files can also help better predict malware. We propose an innovative machine learning approach to extract information from icons. Our proposed approach consists of two steps: 1) extracting icon features using summary statics, histogram of gradients (HOG), and a convolutional autoencoder, 2) clustering icons based on the extracted icon features. Using publicly available data and by using machine learning experiments, we show our proposed icon clusters significantly boost the efficacy of malware prediction models. In particular, our experiments show an average accuracy increase of 10% when icon clusters are used in the prediction model.
The Galaxy Zoo project has provided a plethora of valuable morphological data on a large number of galaxies from various surveys. Several biases have been identified in the Galaxy Zoo data, which users of the data must be aware. Here we report on a newly discovered selection effect. In particular, astronomers interested in studying spiral galaxies may select a set of spiral galaxies based upon a threshold in spirality, which we define as the fraction of Galaxy Zoo humans who have reported seeing spiral structure. One tool that can be used to analyze spiral galaxies is SpArcFiRe, an automated tool that decomposes a spiral galaxy into its constituent spiral arms, providing objective, quantitative data on their structure. One of SpArcFiRe's measures is the pitch angle of spiral arms. We have observed that, when selecting a set of spiral galaxies based on a threshold on Galaxy Zoo spirality, the spiral arms appear to have a mean pitch angle that very clearly increases linearly with redshift for 0.05 z 0.085 even after accounting for the Malmquist bias. We hypothesize that this is a selection effect, based on the fact that tightly-wound spiral arms become less visible as spatial resolution and noise degrade the image with increasing redshift, leading to fewer such galaxies being included in the sample at higher redshifts. We corroborate this hypothesis by artificially degrading images of nearby galaxies, then using a machine learning algorithm trained on Galaxy Zoo data to provide a spirality for each artificially degraded image. It correctly predicts that the detected spirality of a fixed galaxy decreases as image quality degrades. We then use these spiralities to corroborate the hypothesis that the mean pitch angle of those galaxies remaining above a fixed spirality threshold is higher than those eliminated by the selection effect. This demonstrates that users who select samples of galaxies using a threshold of Galaxy Zoo votes must carefully consider the possibility of selection effects on morphological measures, even if the measure itself is believed to be objective and unbiased. Finally, we also perform an empirical sensitivity analysis to demonstrate that SpArcFiRe's output changes in a smooth and predictable fashion to changes in its internal algorithmic parameters.
Automated quantification of galaxy morphology is necessary because the size of upcoming sky surveys will overwhelm human volunteers. Existing classification schemes are inadequate because (a) their uncertainty increases near the boundary of classes and astronomers need more control over these uncertainties; (b) galaxy morphology is continuous rather than discrete; and (c) sometimes we need to know not only the type of an object, but whether a particular image of the object exhibits visible structure. We propose that regression is better suited to these tasks than classification, and focus specifically on determining the extent to which an image of a spiral galaxy exhibits visible spiral structure. We use the human vote distributions from Galaxy Zoo 1 (GZ1) to train a random forest of decision trees to reproduce the fraction of GZ1 humans who vote for the “Spiral” class. We prefer the random forest model over other black box models like neural networks because it allows us to trace post hoc the precise reasoning behind the regression of each image. Finally, we demonstrate that using features from SpArcFiRe—a code designed to isolate and quantify arm structure in spiral galaxies—improves regression results over and above using traditional features alone, across a sample of 470,000 galaxies from the Sloan Digital Sky Survey.
Automated machine classifications of galaxies are necessary because the size of upcoming surveys will overwhelm human volunteers. We improve upon existing machine classification methods by adding the output of SpArcFiRe to the inputs of a machine learning model. We use the human classifications from Galaxy Zoo 1 (GZ1) to train a random forest of decision trees to reproduce the human vote distributions of the Spiral class. We prefer the random forest model over other black box models like neural networks because it allows us to trace post hoc the precise reasoning behind the classification of each galaxy. We find that, across a sample of 470,000 Sloan galaxies that are large enough that details could be seen if they were there, the combination of SpArcFiRe outputs with existing SDSS features provides a better machine classification than either one alone on comparison to Galaxy Zoo 1. We suggest that adding SpArcFiRe outputs as features to any machine learning algorithm will likely improve its performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.