2019
DOI: 10.1186/s12911-019-0973-y
|View full text |Cite
|
Sign up to set email alerts
|

Latent Dirichlet Allocation in predicting clinical trial terminations

Abstract: BackgroundThis study used natural language processing (NLP) and machine learning (ML) techniques to identify reliable patterns from within research narrative documents to distinguish studies that complete successfully, from the ones that terminate. Recent research findings have reported that at least 10 % of all studies that are funded by major research funding agencies terminate without yielding useful results. Since it is well-known that scientific studies that receive funding from major funding agencies are… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(20 citation statements)
references
References 11 publications
1
19
0
Order By: Relevance
“…While previous studies 17,18 only used Random Forest, our research demonstrates the predictive capabilities of other models: (1) Random Forest and XGBoost are superior to Logistic Regression when comparing performance over different combinations of features; (2) XGBoost is statistically superior to all models when considering performance with regards to all features; and (3) our ensemble methods are able to properly handle the class imbalance issue, which are very common in this domain.…”
Section: Discussionmentioning
confidence: 70%
See 4 more Smart Citations
“…While previous studies 17,18 only used Random Forest, our research demonstrates the predictive capabilities of other models: (1) Random Forest and XGBoost are superior to Logistic Regression when comparing performance over different combinations of features; (2) XGBoost is statistically superior to all models when considering performance with regards to all features; and (3) our ensemble methods are able to properly handle the class imbalance issue, which are very common in this domain.…”
Section: Discussionmentioning
confidence: 70%
“…Two previous studies utilized clinical trial study characteristics and descriptions from the ClinicalTrials.gov database to predict terminations 17 , 18 . The first study 17 tokenizes the description field to find high/low frequency words in terminated/completed trials as features to train a binary predictive model.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations