Marc Aerts scite author profile

This Report presents the results from EFSA project RC/EFSA/AMU/2016/01 related to the implementation of machine learning techniques for literature reviews and systematic reviews in EFSA. An overview of the different steps of a systematic review is provided, along with possible ways for automation. Although it was found that most steps could benefit from automation, it was also observed that some steps require more sophisticated methods than those encompassed within the machine learning framework. Availability of data and methodology allowed for the development of an automatic screening tool based on several machine learning techniques. The developed shiny R application can be used for the screening of abstracts and full texts. Properties of machine learning techniques are discussed in this Report together with their most important advantages and disadvantages. The latter discussion includes both general properties, as well as context-specific properties based on their performance in three case studies. Although creating a universal automatic data extraction tool was considered to be infeasible in this stage, this step of the systematic review was addressed to allow the reviewer to scan the uploaded pdf files for certain words or string of words. Based on observations from the performed case studies, recommendations were made regarding which methods are preferred in specific situations. More explicitly, a discussion is made about the performance of the classifiers with respect to the magnitude of the pool of papers to be screened as well as to the amount of imbalance, referring to the proportion of relevant and irrelevant papers. Finally, it was concluded that the results presented in this report provide proof that the developed shiny application could be efficiently used in combination with other software such as DistillerSR. © European Food Safety Authority, 2018Key words: Systematic Reviews, Machine Learning, screening, data extraction, Sensitivity, Specificity Disclaimer: The present document has been produced and adopted by the bodies identified above as author(s). This task has been carried out exclusively by the author(s) in the context of a contract between the European Food Safety Authority and the author(s), awarded following a tender procedure. The present document is published complying with the transparency principle to which the Authority is subject. It may not be considered as an output adopted by the Authority. The European Food Safety Authority reserves its rights, view and position as regards the issues addressed and the conclusions reached in the present document, without prejudice to the rights of the authors. Reproduction is authorised provided the source is acknowledged. Machine Learning Techniques for Literature and Systematic Reviewswww.efsa.europa.eu/publications 3 EFSA Supporting publication 2018:EN-1427The present document has been produced and adopted by the bodies identified above as author. This task has been carried out exclusively by the author in the context of a contract ...

show abstract

Influence of chronic comorbidity and medication on the efficacy of treatment in patients with diabetes in general practice

Wami¹,

Buntinx²,

Bartholomeeusen³

et al. 2013

Br J Gen Pract

View full text Add to dashboard Cite

show abstract

Relation between diabetes, metformin treatment and the occurrence of malignancies in a Belgian primary care setting

Ngwana

Aerts

Carla

et al. 2012

Diabetes Research and Clinical Practice

View full text Add to dashboard Cite

On the Use of Historical Control Data in Pre-Clinical Safety Studies

Maringwa

Faes

Aerts

et al. 2007

Journal of Biopharmaceutical Statistics

View full text Add to dashboard Cite

A number of methods to formally incorporate historical control information in pre-clinical safety evaluation studies have been proposed in literature. However, it remains unclear when one should use historical data. Focusing on the logistic-normal model, we investigate situations where historical studies may prove to be useful. Aspects of estimation (precision and bias) and testing (power) for treatment effect are investigated under different conditions such as the number of historical control studies, the degree of homogeneity amongst them, the level of treatment effect and different control rates. The possibility to use a selected subset of historical control studies is also explored.

show abstract

Quantitative Microbial Risk Assessment Based on Whole Genome Sequencing Data: Case of Listeria monocytogenes

et al. 2020

View full text Add to dashboard Cite

The application of high-throughput DNA sequencing technologies (WGS) data remain an increasingly discussed but vastly unexplored resource in the public health domain of quantitative microbial risk assessment (QMRA). This is due to challenges including high dimensionality of WGS data and heterogeneity of microbial growth phenotype data. This study provides an innovative approach for modeling the impact of population heterogeneity in microbial phenotypic stress response and integrates this into predictive models inputting a high-dimensional WGS data for increased precision exposure assessment using an example of Listeria monocytogenes. Finite mixture models were used to distinguish the number of sub-populations for each of the stress phenotypes, acid, cold, salt and desiccation. Machine learning predictive models were selected from six algorithms by inputting WGS data to predict the sub-population membership of new strains with unknown stress response data. An example QMRA was conducted for cultured milk products using the strains of unknown stress phenotype to illustrate the significance of the findings of this study. Increased resistance to stress conditions leads to increased growth, the likelihood of higher exposure and probability of illness. Neglecting within-species genetic and phenotypic heterogeneity in microbial stress response may over or underestimate microbial exposure and eventual risk during QMRA.

show abstract

Statistical analysis of the Listeria Monocytogenes EU‐wide baseline survey in certain ready‐to‐eat foods Part B: analysis of factors related to the prevalence of Listeria Monocytogenes, predictive models for the microbial growth and for compliance with food safety criteria

Rakhmawati

Nysen

Aerts

2014

EFSA Supporting Publications

View full text Add to dashboard Cite

Correlated gamma frailty models for bivariate survival time data

Martins

Aerts

Hens

et al. 2018

Stat Methods Med Res

View full text Add to dashboard Cite

Frailty models have been developed to quantify both heterogeneity as well as association in multivariate time-to-event data. In recent years, numerous shared and correlated frailty models have been proposed in the survival literature allowing for different association structures and frailty distributions. A bivariate correlated gamma frailty model with an additive decomposition of the frailty variables into a sum of independent gamma components was introduced before. Although this model has a very convenient closed-form representation for the bivariate survival function, the correlation among event- or subject-specific frailties is bounded above which becomes a severe limitation when the values of the two frailty variances differ substantially. In this article, we review existing correlated gamma frailty models and propose novel ones based on bivariate gamma frailty distributions. Such models are found to be useful for the analysis of bivariate survival time data regardless of the censoring type involved. The frailty methodology was applied to right-censored and left-truncated Danish twins mortality data and serological survey current status data on varicella zoster virus and parvovirus B19 infections in Belgium. From our analyses, it has been shown that fitting more flexible correlated gamma frailty models in terms of the imposed association and correlation structure outperforms existing frailty models including the one with an additive decomposition.

show abstract

Simulation‐based evaluation of the performance of the F test in a linear multilevel model setting with sparseness at the level of the primary unit

2016

View full text Add to dashboard Cite

In a linear multilevel model, significance of all fixed effects can be determined using F tests under maximum likelihood (ML) or restricted maximum likelihood (REML). In this paper, we demonstrate that in the presence of primary unit sparseness, the performance of the F test under both REML and ML is rather poor. Using simulations based on the structure of a data example on ceftriaxone consumption in hospitalized children, we studied variability, type I error rate and power in scenarios with a varying number of secondary units within the primary units. In general, the variability in the estimates for the effect of the primary unit decreased as the number of secondary units increased. In the presence of singletons (i.e., only one secondary unit within a primary unit), REML consistently outperformed ML, although even under REML the performance of the F test was found inadequate. When modeling the primary unit as a random effect, the power was lower while the type I error rate was unstable. The options of dropping, regrouping, or splitting the singletons could solve either the problem of a high type I error rate or a low power, while worsening the other. The permutation test appeared to be a valid alternative as it outperformed the F test, especially under REML. We conclude that in the presence of singletons, one should be careful in using the F test to determine the significance of the fixed effects, and propose the permutation test (under REML) as an alternative.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Marc Aerts

Machine learning techniques for the automation of literature reviews and systematic reviews in EFSA

Influence of chronic comorbidity and medication on the efficacy of treatment in patients with diabetes in general practice

Relation between diabetes, metformin treatment and the occurrence of malignancies in a Belgian primary care setting

On the Use of Historical Control Data in Pre-Clinical Safety Studies

Quantitative Microbial Risk Assessment Based on Whole Genome Sequencing Data: Case of Listeria monocytogenes

Statistical analysis of the Listeria Monocytogenes EU‐wide baseline survey in certain ready‐to‐eat foods Part B: analysis of factors related to the prevalence of Listeria Monocytogenes, predictive models for the microbial growth and for compliance with food safety criteria

Correlated gamma frailty models for bivariate survival time data

Simulation‐based evaluation of the performance of the F test in a linear multilevel model setting with sparseness at the level of the primary unit

Contact Info

Product

Resources

About