The probabilistic random forest applied to the QUBRICS survey: improving the selection of high-redshift quasars with synthetic data

Guarneri, Francesco; Calderone, Giorgio; Cristiani, S.; Matteo, Porru,; Fontanot, Fabio; Boutsia, K.; Cupani, G.; Grazian, A.; D’Odorico, V.; Murphy, Michael T.; Bongiorno, A.; Saccheo, Ivano

doi:10.1093/mnras/stac2733

Cited by 3 publications

(3 citation statements)

References 58 publications

(72 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This was the motivation that originated the survey QUBRICS (Calderone et al 2019;Boutsia et al 2020 Reis et al 2019) was adopted, with modifica-tions introduced to properly treat upper limits and missing data. In Guarneri et al (2022) the PRF selection was further improved, in particular adding synthetic data to the training sets. In Calderone et al, (submitted) a method, dubbed Michelangelo, has been developed to significantly boost recall 3 in selection algorithms, even in the presence of severely imbalanced datasets, aimed at extending the QUBRICS survey up to z ∼ 5.…”

Section: The Qubrics Surveymentioning

confidence: 99%

“…But above all one learns that in ML training sets are the key and biases or scarcity in the training sets can produce unfair results, in facial recognition (Buolamwini & Gebru 2018), autonomous driving, fraud detection as well as in finding high redshift quasars. Synthetic data can be a useful solution in cases where real world data is limited (Chaudhari et al 2022;Guarneri et al 2022).…”

Section: The Qubrics Surveymentioning

confidence: 99%

“…Recall: the fraction of relevant instances (i.e., real high-z QSOs) correctly classified by the algorithm. It is a statistical measure related to (but not the same as) the completeness(Guarneri et al 2022). …”

mentioning

confidence: 99%

See 2 more Smart Citations

Spectrographs and Spectroscopists for the Sandage Test

Cristiani¹,

Boutsia²,

Calderone³

et al. 2023

Preprint

View full text Add to dashboard Cite

The redshift drift is a small, dynamic change in the redshift of objects following the Hubble flow. Its measurement provides a direct, real-time, model-independent mapping of the expansion rate of the Universe. It is fundamentally different from other cosmological probes: instead of mapping our (present-day) past light-cone, it directly compares different past light-cones. Being independent of any assumptions on gravity, geometry or clustering, it directly tests the pillars of the ΛCDM paradigm. Recent theoretical studies have uncovered unique synergies with other cosmological probes, including the characterization of the physical properties of dark energy. At the time of the original proposal by Sandage (1962) the expected change in the redshift of objects at cosmological distances appeared to be exceedingly small for reasonable observing times and beyond technological capabilities. In the last decades progress in the spectrographs (e.g. ESPRESSO), in the collecting area of telescopes and in the samples of cosmic beacons, enabled by new datasets and new machine-learning-based selections, have drastically changed the situation, bringing the Redshift Drift Grail within reach. As a consequence, this measurement is a flagship objective of the Extremely Large Telescope (ELT), specifically of its high-resolution spectrograph, ANDES.

show abstract

Section: The Qubrics Surveymentioning

confidence: 99%

Section: The Qubrics Surveymentioning

confidence: 99%

See 1 more Smart Citation

Spectrographs and Spectroscopists for the Sandage Test

Cristiani¹,

Boutsia²,

Calderone³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

Boost recall in quasi-stellar object selection from highly imbalanced photometric datasets

Calderone,

Guarneri,

Porru

et al. 2024

A&A

View full text Add to dashboard Cite

The identification of bright quasi-stellar objects (QSOs) is of fundamental importance to probe the intergalactic medium and address open questions in cosmology. Several approaches have been adopted to find such sources in the currently available photometric surveys, including machine learning methods. However, the rarity of bright QSOs at high redshifts compared to other contaminating sources (such as stars and galaxies) makes the selection of reliable candidates a difficult task, especially when high completeness is required. We present a novel technique to boost recall (i.e., completeness within the considered sample) in the selection of QSOs from photometric datasets dominated by stars, galaxies, and low-$z$ QSOs (imbalanced datasets). Our heuristic method operates by iteratively removing sources whose probability of belonging to a noninteresting class exceeds a user-defined threshold, until the remaining dataset contains mainly high-$z$ QSOs. Any existing machine learning method can be used as the underlying classifier, provided it allows for a classification probability to be estimated. We applied the method to a dataset obtained by cross-matching PanSTARRS1 (DR2), Gaia (DR3), and WISE, and identified the high-$z$ QSO candidates using both our method and its direct multi-label counterpart. We ran several tests by randomly choosing the training and test datasets, and achieved significant improvements in recall which increased from sim 50<!PCT!> to sim 85<!PCT!> for QSOs with $z>2.5$, and from sim 70<!PCT!> to sim 90<!PCT!> for QSOs with $z>3$. Also, we identified a sample of 3098 new QSO candidates on a sample of 2.6 $ 10^6$ sources with no known classification. We obtained follow-up spectroscopy for 121 candidates, confirming 107 new QSOs with $z > 2.5$. Finally, a comparison of our QSO candidates with those selected by an independent method based on GAIA spectroscopy shows that the two samples overlap by more than 90<!PCT!> and that both selection methods are potentially capable of achieving a high level of completeness.

show abstract

Spectroscopy of QUBRICS quasar candidates: 1672 new redshifts and a golden sample for the Sandage test of the redshift drift

Cristiani

Porru

Guarneri

et al. 2023

Monthly Notices of the Royal Astronomical Society

View full text Add to dashboard Cite

The QUBRICS (QUasars as BRIght beacons for Cosmology in the Southern hemisphere) survey aims at constructing a sample of the brightest quasars with $z \lower.5ex\hbox{$\; \buildrel> \over \sim \;$}2.5$, observable with facilities in the Southern Hemisphere. QUBRICS makes use of the available optical and IR wide-field surveys in the South and of Machine Learning techniques to produce thousands of bright quasar candidates of which only a few hundred have been confirmed with follow-up spectroscopy. Taking advantage of the recent Gaia Data Release 3, which contains 220 million low-resolution spectra, and of a newly developed spectral energy distribution fitting technique, designed to combine the photometric information with the Gaia spectroscopy, it has been possible to measure 1672 new secure redshifts of QUBRICS candidates, with a typical uncertainty of σz = 0.02. This significant progress of QUBRICS brings it closer to (one of) its primary goals: providing a sample of bright quasars at redshift 2.5 < z < 5 to perform the Sandage test of the cosmological redshift drift. A Golden Sample of seven quasars is presented that makes it possible to carry out this experiment in about 1500 hours of observation in 25 years, using the ANDES spectrograph at the 39m ELT, a significant improvement with respect to previous estimates.

show abstract

The probabilistic random forest applied to the QUBRICS survey: improving the selection of high-redshift quasars with synthetic data

Cited by 3 publications

References 58 publications

Spectrographs and Spectroscopists for the Sandage Test

Spectrographs and Spectroscopists for the Sandage Test

Boost recall in quasi-stellar object selection from highly imbalanced photometric datasets

Spectroscopy of QUBRICS quasar candidates: 1672 new redshifts and a golden sample for the Sandage test of the redshift drift

Contact Info

Product

Resources

About