Bálint Ármin Pataki scite author profile

Weak gravitational lensing is one of the most promising cosmological probes of the late universe. Several large ongoing (DES, KiDS, HSC) and planned (LSST, Euclid, WFIRST) astronomical surveys attempt to collect even deeper and larger scale data on weak lensing. Due to gravitational collapse, the distribution of dark matter is non-Gaussian on small scales. However, observations are typically evaluated through the two-point correlation function of galaxy shear, which does not capture non-Gaussian features of the lensing maps. Previous studies attempted to extract non-Gaussian information from weak lensing observations through several higher order statistics such as the three-point correlation function, peak counts, or Minkowski functionals. Deep convolutional neural networks (CNN) emerged in the field of computer vision with tremendous success, and they offer a new and very promising framework to extract information from 2D or 3D astronomical data sets, confirmed by recent studies on weak lensing. We show that a CNN is able to yield significantly stricter constraints of (σ8, Ωm) cosmological parameters than the power spectrum using convergence maps generated by full N-body simulations and ray-tracing, at angular scales and shape noise levels relevant for future observations. In a scenario mimicking LSST or Euclid, the CNN yields 2.4–2.8 times smaller credible contours than the power spectrum, and 3.5–4.2 times smaller at noise levels corresponding to a deep space survey such as WFIRST. We also show that at shape noise levels achievable in future space surveys the CNN yields 1.4–2.1 times smaller contours than peak counts, a higher order statistic capable of extracting non-Gaussian information from weak lensing maps.

show abstract

An improved cosmological parameter inference scheme motivated by deep learning

Ribli

Pataki

Csabai

2018

Nat Astron

View full text Add to dashboard Cite

Dark matter cannot be observed directly, but its weak gravitational lensing slightly distorts the apparent shapes of background galaxies, making weak lensing one of the most promising probes of cosmology. Several observational studies have measured the effect, and there are currently running [1,2], and planned efforts [3,4] to provide even larger, and higher resolution weak lensing maps. Due to nonlinearities on small scales, the traditional analysis with two-point statistics does not fully capture all the underlying information [5]. Multiple inference methods were proposed to extract more details based on higher order statistics [6, 7], peak statistics [8][9][10][11][12][13], Minkowski functionals [14-16] and recently convolutional neural networks (CNN) [17,18]. Here we present an improved convolutional neural network that gives significantly better estimates of Ω m and σ 8 cosmological parameters from simulated convergence maps than the state of art methods and also is free of systematic bias. We show that the network exploits information in the gradients around peaks, and with this insight, we construct a new, easy-to-understand, and robust peak counting algorithm based on the 'steepness' of peaks, instead of their heights. The proposed scheme is even more accurate than the neural network on high-resolution noiseless maps. With shape noise and lower resolution its relative advantage deteriorates, but it remains more accurate than peak counting.Following the idea and using the simulation data from a recent study [18] we created an improved convolutional neural network (CNN) architecture (see details in the Methods) which is able to recover cosmological parameters more accurately from simulated weak lensing maps. The input of the network is a set of mock convergence (κ) maps generated by ray-tracing n-body simulations with 96 different values for the matter density Ω m and the scale of the initial perturbations normalized at the late Universe, σ 8 (see [18] and [19] for details of the weak lensing map generation), the outputs of the network were the predicted cosmological parameters. The modifications of the CNN mostly consisted of adding further activations, increasing the number of filters, and introducing a regular block structure, following successful computer vision models [20,21].

show abstract

Crowdsourcing assessment of maternal blood multi-omics for predicting gestational age and preterm birth

Tarca

Pataki

Romero

et al. 2021

Cell Reports Medicine

View full text Add to dashboard Cite

show abstract

Understanding and predicting ciprofloxacin minimum inhibitory concentration in Escherichia coli with machine learning

Pataki

Matamoros

Putten

et al. 2020

Sci Rep

View full text Add to dashboard Cite

It is important that antibiotics prescriptions are based on antimicrobial susceptibility data to ensure effective treatment outcomes. The increasing availability of next-generation sequencing, bacterial whole genome sequencing (WGS) can facilitate a more reliable and faster alternative to traditional phenotyping for the detection and surveillance of AMR. This work proposes a machine learning approach that can predict the minimum inhibitory concentration (MIC) for a given antibiotic, here ciprofloxacin, on the basis of both genome-wide mutation profiles and profiles of acquired antimicrobial resistance genes. We analysed 704 Escherichia coli genomes combined with their respective MIC measurements for ciprofloxacin originating from different countries. The four most important predictors found by the model, mutations in gyrA residues Ser83 and Asp87, a mutation in parC residue Ser80 and presence of the qnrS1 gene, have been experimentally validated before. Using only these four predictors in a linear regression model, 65% and 93% of the test samples’ MIC were correctly predicted within a two- and a four-fold dilution range, respectively. The presented work does not treat machine learning as a black box model concept, but also identifies the genomic features that determine susceptibility. The recent progress in WGS technology in combination with machine learning analysis approaches indicates that in the near future WGS of bacteria might become cheaper and faster than a MIC measurement.

show abstract

Crowdsourcing digital health measures to predict Parkinson’s disease severity: the Parkinson’s Disease Digital Biomarker DREAM Challenge

Sieberts

Schaff²,

Duda

et al. 2021

npj Digit. Med.

View full text Add to dashboard Cite

Consumer wearables and sensors are a rich source of data about patients’ daily disease and symptom burden, particularly in the case of movement disorders like Parkinson’s disease (PD). However, interpreting these complex data into so-called digital biomarkers requires complicated analytical approaches, and validating these biomarkers requires sufficient data and unbiased evaluation methods. Here we describe the use of crowdsourcing to specifically evaluate and benchmark features derived from accelerometer and gyroscope data in two different datasets to predict the presence of PD and severity of three PD symptoms: tremor, dyskinesia, and bradykinesia. Forty teams from around the world submitted features, and achieved drastically improved predictive performance for PD status (best AUROC = 0.87), as well as tremor- (best AUPR = 0.75), dyskinesia- (best AUPR = 0.48) and bradykinesia-severity (best AUPR = 0.95).

show abstract

Crowdsourcing assessment of maternal blood multi-omics for predicting gestational age and preterm birth

Tarca

Pataki

Romero

et al. 2020

Preprint

View full text Add to dashboard Cite

53Identification of pregnancies at risk of preterm birth (PTB), the leading cause of newborn deaths, 54 remains challenging given the syndromic nature of the disease. We report a longitudinal multi-55 omics study coupled with a DREAM challenge to develop predictive models of PTB. We found 56 that whole blood gene expression predicts ultrasound-based gestational ages in normal and 57 complicated pregnancies (r=0.83), as well as the delivery date in normal pregnancies (r=0.86), 58with an accuracy comparable to ultrasound. However, unlike the latter, transcriptomic data 59 collected at <37 weeks of gestation predicted the delivery date of one third of spontaneous (sPTB) 60 cases within 2 weeks of the actual date. Based on samples collected before 33 weeks in 61 asymptomatic women we found expression changes preceding preterm prelabor rupture of the 62 membranes that were consistent across time points and cohorts, involving, among others, 63 leukocyte-mediated immunity. Plasma proteomic random forests predicted sPTB with higher 64 accuracy and earlier in pregnancy than whole blood transcriptomic models (e.g. AUROC=0.76 vs. 65 AUROC=0.6 at 27-33 weeks of gestation). 66 67 68 69 Early identification of patients at risk for obstetrical disease is required to improve health outcomes 70 and develop new therapeutic interventions. One of the "great obstetrical syndromes" 1 , preterm 71 birth, defined as birth prior to the completion of 37 weeks of gestation, is the leading cause of 72 newborn deaths worldwide. In 2010, 14.9 million babies were born preterm, accounting for 11.1% 73 of all births across 184 countries-the highest preterm birth rates occurring in Africa and North 74 America 2 . In the United States, the rate of prematurity remained fundamentally unchanged in 75 recent years 3 and it has an annual societal economic burden of at least $26.2 billion 4 . The high 76 incidence of preterm birth is concerning: 29% of all neonatal deaths worldwide, approximately 1 77 million deaths in total, can be attributed to complications of prematurity 5 . Furthermore, children 78 born prematurely are at increased risk for several short-and long-term complications that may 79 include motor, cognitive, and behavioral impairments 6,7 . 80Approximately one-third of preterm births are medically indicated for maternal (e.g. preeclampsia) 81 or fetal conditions (e.g. growth restriction); the other two-thirds are categorized as spontaneous 82 preterm births, inclusive of spontaneous preterm labor and delivery with intact membranes (sPTD), 83and preterm prelabor rupture of the membranes (PPROM) 8 . Preterm birth is a syndrome with 84 multiple etiologies 9 , and its complexity makes accurate prediction by a single set of biomarkers 85 difficult. While genetic risk factors for preterm birth have been reported 10 , the two most powerful 86 predictors of spontaneous preterm birth are a sonographic short cervix in the midtrimester, and a 87 history of spontaneous preterm birth in a prior pregnancy. 11 As for prevention of the syndrome, 88vagi...

show abstract

The COMPARE Data Hubs

Amid

Pakseresht

Silvester

et al. 2019

Preprint

View full text Add to dashboard Cite

Data sharing enables research communities to exchange findings and build upon the knowledge that arises from their discoveries. Areas of public and animal health as well as food safety would benefit from rapid data sharing when it comes to emergencies. However, ethical, regulatory, and institutional challenges, as well as lack of suitable platforms which provide an infrastructure for data sharing in structured formats often lead to data not being shared, or at most shared in form of supplementary materials in journal publications. Here, we describe an informatics platform that includes workflows for structured data storage, managing and pre-publication sharing of pathogen sequencing data and its analysis interpretations with relevant stakeholders.

show abstract

Deep learning identification for citizen science surveillance of tiger mosquitoes

Pataki

Garriga

Eritja

et al. 2021

Sci Rep

View full text Add to dashboard Cite

Global monitoring of disease vectors is undoubtedly becoming an urgent need as the human population rises and becomes increasingly mobile, international commercial exchanges increase, and climate change expands the habitats of many vector species. Traditional surveillance of mosquitoes, vectors of many diseases, relies on catches, which requires regular manual inspection and reporting, and dedicated personnel, making large-scale monitoring difficult and expensive. New approaches are solving the problem of scalability by relying on smartphones and the Internet to enable novel community-based and digital observatories, where people can upload pictures of mosquitoes whenever they encounter them. An example is the Mosquito Alert citizen science system, which includes a dedicated mobile phone app through which geotagged images are collected. This system provides a viable option for monitoring the spread of various mosquito species across the globe, although it is partly limited by the quality of the citizen scientists’ photos. To make the system useful for public health agencies, and to give feedback to the volunteering citizens, the submitted images are inspected and labeled by entomology experts. Although citizen-based data collection can greatly broaden disease-vector monitoring scales, manual inspection of each image is not an easily scalable option in the long run, and the system could be improved through automation. Based on Mosquito Alert’s curated database of expert-validated mosquito photos, we trained a deep learning model to find tiger mosquitoes (Aedes albopictus), a species that is responsible for spreading chikungunya, dengue, and Zika among other diseases. The highly accurate 0.96 area under the receiver operating characteristic curve score promises not only a helpful pre-selector for the expert validation process but also an automated classifier giving quick feedback to the app participants, which may help to keep them motivated. In the paper, we also explored the possibilities of using the model to improve future data collection quality as a feedback loop.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.