2022
DOI: 10.1093/mnras/stac2733
|View full text |Cite
|
Sign up to set email alerts
|

The probabilistic random forest applied to the QUBRICS survey: improving the selection of high-redshift quasars with synthetic data

Abstract: Several recent works have focused on the search for bright, high-z quasars (QSOs) in the South. Among them, the QUasars as BRIght beacons for Cosmology in the Southern hemisphere (QUBRICS) survey has now delivered hundreds of new spectroscopically confirmed QSOs selected by means of machine learning algorithms. Building upon the results obtained by introducing the probabilistic random forest (PRF) for the QUBRICS selection, we explore in this work the feasibility of training the algorithm on synthetic data to … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 58 publications
(72 reference statements)
0
3
0
Order By: Relevance
“…This was the motivation that originated the survey QUBRICS (Calderone et al 2019;Boutsia et al 2020 Reis et al 2019) was adopted, with modifica-tions introduced to properly treat upper limits and missing data. In Guarneri et al (2022) the PRF selection was further improved, in particular adding synthetic data to the training sets. In Calderone et al, (submitted) a method, dubbed Michelangelo, has been developed to significantly boost recall 3 in selection algorithms, even in the presence of severely imbalanced datasets, aimed at extending the QUBRICS survey up to z ∼ 5.…”
Section: The Qubrics Surveymentioning
confidence: 99%
See 2 more Smart Citations
“…This was the motivation that originated the survey QUBRICS (Calderone et al 2019;Boutsia et al 2020 Reis et al 2019) was adopted, with modifica-tions introduced to properly treat upper limits and missing data. In Guarneri et al (2022) the PRF selection was further improved, in particular adding synthetic data to the training sets. In Calderone et al, (submitted) a method, dubbed Michelangelo, has been developed to significantly boost recall 3 in selection algorithms, even in the presence of severely imbalanced datasets, aimed at extending the QUBRICS survey up to z ∼ 5.…”
Section: The Qubrics Surveymentioning
confidence: 99%
“…But above all one learns that in ML training sets are the key and biases or scarcity in the training sets can produce unfair results, in facial recognition (Buolamwini & Gebru 2018), autonomous driving, fraud detection as well as in finding high redshift quasars. Synthetic data can be a useful solution in cases where real world data is limited (Chaudhari et al 2022;Guarneri et al 2022).…”
Section: The Qubrics Surveymentioning
confidence: 99%
See 1 more Smart Citation