Learning to identify relevant studies for systematic reviews using random forest and external information

Khabsa, Madian; Elmagarmid, Ahmed K.; Ilyas, Ihab F.; Hammady, Hossam M.; Ouzzani, Mourad

doi:10.1007/s10994-015-5535-7

Cited by 69 publications

(58 citation statements)

References 20 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…For instance, Khabsa et al (2016) created a random forests classifier using different feature spaces. In addition to working with lexical features (i.e.…”

Section: Classification Methodsmentioning

confidence: 99%

“…Related to the 13 new papers related to MLTs in the screening stage, it was apparent that mainly the SVMs and the ensemble methods (such as random forest and Bayesian ensembles) received a lot of attention. For instance, Khabsa et al (2016) created a random forests classifier using different feature spaces. In addition to working with lexical features (i.e.…”

Section: Supervised Learningmentioning

confidence: 99%

See 1 more Smart Citation

Machine learning techniques for the automation of literature reviews and systematic reviews in EFSA

Jaspers

Troyer

Aerts

2018

EFS3

View full text Add to dashboard Cite

This Report presents the results from EFSA project RC/EFSA/AMU/2016/01 related to the implementation of machine learning techniques for literature reviews and systematic reviews in EFSA. An overview of the different steps of a systematic review is provided, along with possible ways for automation. Although it was found that most steps could benefit from automation, it was also observed that some steps require more sophisticated methods than those encompassed within the machine learning framework. Availability of data and methodology allowed for the development of an automatic screening tool based on several machine learning techniques. The developed shiny R application can be used for the screening of abstracts and full texts. Properties of machine learning techniques are discussed in this Report together with their most important advantages and disadvantages. The latter discussion includes both general properties, as well as context-specific properties based on their performance in three case studies. Although creating a universal automatic data extraction tool was considered to be infeasible in this stage, this step of the systematic review was addressed to allow the reviewer to scan the uploaded pdf files for certain words or string of words. Based on observations from the performed case studies, recommendations were made regarding which methods are preferred in specific situations. More explicitly, a discussion is made about the performance of the classifiers with respect to the magnitude of the pool of papers to be screened as well as to the amount of imbalance, referring to the proportion of relevant and irrelevant papers. Finally, it was concluded that the results presented in this report provide proof that the developed shiny application could be efficiently used in combination with other software such as DistillerSR. © European Food Safety Authority, 2018Key words: Systematic Reviews, Machine Learning, screening, data extraction, Sensitivity, Specificity Disclaimer: The present document has been produced and adopted by the bodies identified above as author(s). This task has been carried out exclusively by the author(s) in the context of a contract between the European Food Safety Authority and the author(s), awarded following a tender procedure. The present document is published complying with the transparency principle to which the Authority is subject. It may not be considered as an output adopted by the Authority. The European Food Safety Authority reserves its rights, view and position as regards the issues addressed and the conclusions reached in the present document, without prejudice to the rights of the authors. Reproduction is authorised provided the source is acknowledged. Machine Learning Techniques for Literature and Systematic Reviewswww.efsa.europa.eu/publications 3 EFSA Supporting publication 2018:EN-1427The present document has been produced and adopted by the bodies identified above as author. This task has been carried out exclusively by the author in the context of a contract ...

show abstract

“…For instance, Khabsa et al (2016) created a random forests classifier using different feature spaces. In addition to working with lexical features (i.e.…”

Section: Classification Methodsmentioning

confidence: 99%

Section: Supervised Learningmentioning

confidence: 99%

Machine learning techniques for the automation of literature reviews and systematic reviews in EFSA

Jaspers

Troyer

Aerts

2018

EFS3

View full text Add to dashboard Cite

show abstract

“…A number of di erent classi cation methods have been developed, including support vector machine classication [4,9], voting perceptron [5] and random forest [8]. More sophisticated systems combine both prioritisation via ranking and ltering via classi cation and provide signi cant work savings [9].…”

Section: Related Workmentioning

confidence: 99%

“…Cohen et al [5] developed an evaluation collection containing 15 systematic drug class reviews. is collection was used also in subsequent work [4,8]. Our test collection provides 94 search strategies.…”

Section: Related Workmentioning

confidence: 99%

A Test Collection for Evaluating Retrieval of Studies for Inclusion in Systematic Reviews

Scells

Zuccon

Koopman

et al. 2017

Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

This version is available at https://strathprints.strath.ac.uk/62696/ Strathprints is designed to allow users to access the research output of the University of Strathclyde. Unless otherwise explicitly stated on the manuscript, Copyright © and Moral Rights for the papers on this site are retained by the individual authors and/or other copyright owners. Please check the manuscript for details of any other licences that may have been applied. You may not engage in further distribution of the material for any profitmaking activities or any commercial gain. You may freely distribute both the url (https://strathprints.strath.ac.uk/) and the content of this paper for research or private study, educational, or not-for-profit purposes without prior permission or charge.Any correspondence concerning this service should be sent to the Strathprints administrator: strathprints@strath.ac.ukThe Strathprints institutional repository (https://strathprints.strath.ac.uk) is a digital archive of University of Strathclyde research outputs. It has been developed to disseminate open access research outputs, expose data about those outputs, and enable the management and persistent access to Strathclyde's intellectual output. A Test Collection for Evaluating Retrieval of Studies for Inclusion in Systematic ReviewsHarrisen Scells ABSTRACT is paper introduces a test collection for evaluating the e ectiveness of di erent methods used to retrieve research studies for inclusion in systematic reviews. Systematic reviews appraise and synthesise studies that meet speci c inclusion criteria. Systematic reviews intended for a biomedical science audience use boolean queries with many, o en complex, search clauses to retrieve studies; these are then manually screened to determine eligibility for inclusion in the review. is process is expensive and time consuming. e development of systems that improve retrieval e ectiveness will have an immediate impact by reducing the complexity and resources required for this process. Our test collection consists of approximately 26 million research studies extracted from the freely available MEDLINE database, 94 review (query) topics extracted from Cochrane systematic reviews, and corresponding relevance assessments. Tasks for which the collection can be used for information retrieval system evaluation are described and the use of the collection to evaluate common baselines within one such task is demonstrated. e test collection is available at h ps://github.com/ielab/SIGIR2017-PICO-Collection.

show abstract

“…I explored the Similarity Graph, a unique Rayyan feature related to the five-star ranking algorithm, and think this picture of a set of references is lovely (Figure 4). Khabsa et al [1] described what the Similarity Graph represents.…”

Section: Intended Audiencementioning

confidence: 99%

Covidence and Rayyan

Couban

2016

J Can Health Libr Assoc

View full text Add to dashboard Cite

show abstract

Learning to identify relevant studies for systematic reviews using random forest and external information

Cited by 69 publications

References 20 publications

Machine learning techniques for the automation of literature reviews and systematic reviews in EFSA

Machine learning techniques for the automation of literature reviews and systematic reviews in EFSA

A Test Collection for Evaluating Retrieval of Studies for Inclusion in Systematic Reviews

Covidence and Rayyan

Contact Info

Product

Resources

About