Jonathan de Bruin scite author profile

To help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible, we designed a tool to accelerate the step of screening titles and abstracts. For many tasks—including but not limited to systematic reviews and meta-analyses—the scientific literature needs to be checked systematically. Scholars and practitioners currently screen thousands of studies by hand to determine which studies to include in their review or meta-analysis. This is error prone and inefficient because of extremely imbalanced data: only a fraction of the screened studies is relevant. The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We therefore developed an open source machine learning-aided pipeline applying active learning: ASReview. We demonstrate by means of simulation studies that active learning can yield far more efficient reviewing than manual reviewing while providing high quality. Furthermore, we describe the options of the free and open source research software and present the results from user experience tests. We invite the community to contribute to open source projects such as our own that provide measurable and reproducible improvements over current practice.

show abstract

Interactions between cyclists and automated vehicles: Results of a photo experiment^*

Hagenzieker

Kint

Vissers

et al. 2019

Journal of Transportation Safety & Security

View full text Add to dashboard Cite

Cyclists may have incorrect expectations of the behaviour of automated vehicles in interactions with them, which could bring extra risks in traffic. This study investigated whether expectations and behavioural intentions of cyclists when interacting with automated cars differed from those with manually driven cars. A photo experiment was conducted with 35 participants who judged bicycle-car interactions from the perspective of the cyclist. Thirty photos were presented. An experimental design was used with between-subjects factor instruction (two levels: positive, neutral), and two within-subjects factors: car type (three levels: roof name plate, stickerthese two external features indicated automated cars; and traditional car), and series (two levels: first, second). Participants were asked how sure they were to be noticed by the car shown in the photos, whether the car would stop, and how they would behave themselves. A subset of nine participants was equipped with an eye-tracker. Findings generally point to cautious dispositions towards automated cars: participants were not more confident to be noticed when interacting with both types of automated cars than with manually driven cars. Participants were more confident that automated cars would stop for them during the second series and looked significantly longer at automated cars during the first.

show abstract

Situation awareness increases when drivers have more time to take over the wheel in a Level 3 automated car: A simulator study

Vlakveld

Nes

Bruin

et al. 2018

Transportation Research Part F: Traffic Psychology and Behaviou

View full text Add to dashboard Cite

Active learning for screening prioritization in systematic reviews - A simulation study

Ferdinands¹,

Schram²,

Bruin³

et al. 2020

Preprint

View full text Add to dashboard Cite

BackgroundConducting a systematic review requires great screening effort. Various tools have been proposed to speed up the process of screening thousands of titles and abstracts by engaging in active learning. In such tools, the reviewer interacts with machine learning software to identify relevant publications as early as possible. To gain a comprehensive understanding of active learning models for reducing workload in systematic reviews, the current study provides a methodical overview of such models. Active learning models were evaluated across four different classification techniques (naive Bayes, logistic regression, support vector machines, and random forest) and two different feature extraction strategies (TF-IDF and doc2vec). Moreover, models were evaluated across six systematic review datasets from various research areas to assess generalizability of active learning models across different research contexts. MethodsPerformance of the models were assessed by conducting simulations on six systematic review datasets. We defined desirable model performance as maximizing recall while minimizing the number of publications needed to screen. Model performance was evaluated by recall curves, WSS@95, RRF@10, and ATD. ResultsWithin all datasets, the model performance exceeded screening at random order to a great degree.The models reduced the number of publications needed to screen by 91.7% to 63.9%. ConclusionsActive learning models for screening prioritization show great potential in reducing the workload in systematic reviews. Overall, the Naive Bayes + TF-IDF model performed the best.

show abstract

Online housing search: A gravity model approach

Steegmans

Bruin

2021

PLoS ONE

View full text Add to dashboard Cite

In this paper we apply a gravity framework to user-generated data of a large online housing market platform. We show that gravity describes the patterns of inflow and outflow of hits (mouse clicks, etc.) from one municipality to another, where the municipality of the user defines the origin and the municipality of the property that is viewed defines the destination. By distinguishing serious searchers from recreational searchers we demonstrate that the gravity framework describes geographic search patterns of both types of users. The results indicate that recreational search is centered more around the user’s location than serious search. However, this finding is driven entirely by differences in border effects as there is no difference in the distance effect. By demonstrating that geographic search patterns of both serious and recreational searchers are explained by their physical locations, we present clear evidence that physical location is an important determinant of economic behavior in the virtual realm too.

show abstract

Active learning-based Systematic reviewing using switching classification models: the case of the onset, maintenance, and relapse of depressive disorders

Teijema¹,

Hofstee²,

Brouwer³

et al. 2022

Preprint

View full text Add to dashboard Cite

Systematic reviews and meta-analyses are top of the bill in research. However, the screening phase requires an enormous effort in reading and labeling thousands of papers identified via systematic search. Active learning-aided systematic reviewing offers a solution by combining machine learning algorithms with user input to reduce screening load. This study explores the performance of these algorithms and different ways to apply them. This study is divided into four studies evaluating and improving this active learning pipeline. First, the performance and stability of the active learning pipeline were assessed via simulations and re-analysis of the outcome. Secondly, a convolutional neural network was developed to improve upon available machine learning algorithms. Thirdly, the performance of different algorithm combinations was tested and compared. Finally, algorithm-switching models were built for increased performance. The study concludes with proposals for improving active learning-aided systematic reviews based on combinations of the four studies. It was found that switching models can outperform the currently used models.

show abstract

AI-aided Systematic Review to Create a Database with Potentially Relevant Papers on Depression, Anxiety, and Addiction

Brouwer¹,

Hofstee²,

Brand³

et al. 2022

Preprint

View full text Add to dashboard Cite

It is of utmost importance to provide an overview and strength of evidence of predictive factors and to investigate the current state of affairs on evidence for all published and hypothesized factors that contribute to the onset, relapse, and maintenance of anxiety-, substance use-, and depressive disorders. Thousands of such articles have been published on potential factors of CMDs, yet a clear overview of all preceding factors and interaction between factors is missing. Therefore, the main aim of the current project was to create a database with potentially relevant papers obtained via a systematic. The current paper describes every step of the process of constructing the database, from search query to database. After a broad search and cleaning of the data, we used active learning using a shallow classifier and labeled the first set of papers. Then, we applied a second screening phase in which we switched to a different active learning model (i.e., a neural net) to identify difficult-to-find papers due to concept ambiguity. In the third round of screening, we checked for incorrectly included/excluded papers in a quality assessment procedure resulting in the final database. All scripts, data files, and output files of the software are available via Zenodo (for Github code), the Open Science Framework (for protocols, output), and DANS (for the datasets) and are referred to in the specific sections, thereby making the project fully reproducible.

show abstract

Python Record Linkage Toolkit: A toolkit for record linkage and duplicate detection in Python

Bruin¹

2019

View full text Add to dashboard Cite

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.