2020
DOI: 10.1051/0004-6361/201936770
|View full text |Cite
|
Sign up to set email alerts
|

Identifying galaxies, quasars, and stars with machine learning: A new catalogue of classifications for 111 million SDSS sources without spectra

Abstract: We used 3.1 million spectroscopically labelled sources from the Sloan Digital Sky Survey (SDSS) to train an optimised random forest classifier using photometry from the SDSS and the Widefield Infrared Survey Explorer. We applied this machine learning model to 111 million previously unlabelled sources from the SDSS photometric catalogue which did not have existing spectroscopic observations. Our new catalogue contains 50.4 million galaxies, 2.1 million quasars, and 58.8 million stars. We provide individual clas… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
54
1

Year Published

2021
2021
2022
2022

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 69 publications
(63 citation statements)
references
References 52 publications
1
54
1
Order By: Relevance
“…This adds to the complexity of model optimization. The results on faint end extrapolation are reported to have a high impact on the estimation reliability (e.g., Shu et al 2019;Clarke et al 2020;Logan & Fotopoulou 2020). We achieved satisfactory extrapolation results in r < 23.5, which is 1.5 magnitude larger than the SDSS limit.…”
Section: Limitations and Possible Improvementsmentioning
confidence: 54%
“…This adds to the complexity of model optimization. The results on faint end extrapolation are reported to have a high impact on the estimation reliability (e.g., Shu et al 2019;Clarke et al 2020;Logan & Fotopoulou 2020). We achieved satisfactory extrapolation results in r < 23.5, which is 1.5 magnitude larger than the SDSS limit.…”
Section: Limitations and Possible Improvementsmentioning
confidence: 54%
“…The impact of the 5σ limits requested in W3 and W4 bands was investigated on the star/galaxy/quasar data set provided by Clarke et al (2020). 2 The magnitude selection cuts W3 < 11.32 and W4 < 8.0 3 were found to remove a very large fraction (∼98 per cent) of extragalactic sources from the sample but, unfortunately, also potential radio stars.…”
Section: Cross-matching With Supplementary Infrared Surveysmentioning
confidence: 99%
“…Particularly for quasars, despite how important they are to a wide range of astronomy studies and research, their sample sizes are still in the relative minority class (Clarke et al, 2020). It is only through improved classification processes can significant increases be made in the sample sizes of quasars and other celestial objects, consequently enabling further progress to be made in research.…”
Section: Significance and Literature Reviewmentioning
confidence: 99%
“…With modern telescopes recording an increasingly large amount of astronomical data, the usage of machine learning models in the task of classification has become more and more significant and prevalent because of their accuracy and speed (Clarke et al, 2020). While reviews of previous studies with similar objectives of classifications found both supervised and unsupervised machine learning models to be capable of great performance with accuracy and other metrics measured at over 90%, the supervised models had generally higher accuracies in classification, and unsupervised models were shown to be more effective at detecting unknown objects (Viquar et al, 2018;Zhang et al, 2013).…”
Section: Significance and Literature Reviewmentioning
confidence: 99%
See 1 more Smart Citation