Latent Dirichlet Allocation in predicting clinical trial terminations

Geletta, Simon; Follett, Lendie; Laugerman, Marcia

doi:10.1186/s12911-019-0973-y

Cited by 15 publications

(20 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While previous studies 17,18 only used Random Forest, our research demonstrates the predictive capabilities of other models: (1) Random Forest and XGBoost are superior to Logistic Regression when comparing performance over different combinations of features; (2) XGBoost is statistically superior to all models when considering performance with regards to all features; and (3) our ensemble methods are able to properly handle the class imbalance issue, which are very common in this domain.…”

Section: Discussionmentioning

confidence: 70%

“…Two previous studies utilized clinical trial study characteristics and descriptions from the ClinicalTrials.gov database to predict terminations 17 , 18 . The first study 17 tokenizes the description field to find high/low frequency words in terminated/completed trials as features to train a binary predictive model.…”

Section: Introductionmentioning

confidence: 99%

“…The first study 17 tokenizes the description field to find high/low frequency words in terminated/completed trials as features to train a binary predictive model. The second study 18 uses Latent Dirichlet Allocation to find topics associated to terminated/completed trials. The corresponding topic probabilities are used as variables in predicting clinical trial terminations.…”

Section: Introductionmentioning

confidence: 99%

“…The corresponding topic probabilities are used as variables in predicting clinical trial terminations. Both studies determined that the addition of unstructured data to structured data increases the predictive power of a model for terminated clinical trials 17 , 18 . These results provide validity to our research design of using structured and unstructured information as variables to predict clinical trial terminations.…”

Section: Introductionmentioning

confidence: 99%

“…Further more, the results indicate that the combination of statistic features, created from clinical trial structural information, keyword features and embedding features have the highest predictive performance. Predictive Modeling and Validation: Comparing to existing studies 17 , 18 , we investigate a variety of learning algorithms to address class imbalance and feature combinations for clinical trial termination prediction. Our model achieves over 0.73 AUC and 67% balanced accuracy scores for prediction, representing the best performance for open domain clinical trial prediction.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Predictive modeling of clinical trial terminations using feature engineering and embedding learning

Elkin

Zhu

2021

Sci Rep

View full text Add to dashboard Cite

In this study, we propose to use machine learning to understand terminated clinical trials. Our goal is to answer two fundamental questions: (1) what are common factors/markers associated to terminated clinical trials? and (2) how to accurately predict whether a clinical trial may be terminated or not? The answer to the first question provides effective ways to understand characteristics of terminated trials for stakeholders to better plan their trials; and the answer to the second question can direct estimate the chance of success of a clinical trial in order to minimize costs. By using 311,260 trials to build a testbed with 68,999 samples, we use feature engineering to create 640 features, reflecting clinical trial administration, eligibility, study information, criteria etc. Using feature ranking, a handful of features, such as trial eligibility, trial inclusion/exclusion criteria, sponsor types etc., are found to be related to the clinical trial termination. By using sampling and ensemble learning, we achieve over 67% Balanced Accuracy and over 0.73 AUC (Area Under the Curve) scores to correctly predict clinical trial termination, indicating that machine learning can help achieve satisfactory prediction results for clinical trial study.

show abstract

Section: Discussionmentioning

confidence: 70%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Predictive modeling of clinical trial terminations using feature engineering and embedding learning

Elkin

Zhu

2021

Sci Rep

View full text Add to dashboard Cite

show abstract

Drug review sentimental analysis based on modular lexicon generation and a fusion of bidirectional threshold weighted mapping CNN‐RNN

Dubey¹,

Singh

Sheoran³

et al. 2022

Concurrency and Computation

View full text Add to dashboard Cite

Summary In drug review sentimental analysis (SA), users can share their experiences after consuming the drugs, which provides an accurate decision about the safety of the drug and public health. Patient‐written medical and health‐care reviews are among the most valuable and informative textual content on social media, but researchers in the areas of natural language processing (NLP) and data mining have not researched them thoroughly. These reviews provide insight into patients' interactions with doctors, treatment, and satisfaction or dissatisfaction with health services. The existing approaches have some problems like exploding/vanishing gradients and do not have sequential modeling. When learning long reviews, the exploding and vanishing gradient problems occurs. This problem makes it hard to tune parameters and learn in the network. The existing methods do not have sequential modeling because they fail to extract long dependencies for long reviews in both backward and forward directions. To overcome these issues, we proposed a Modular Lexicon Generation and a Fusion of Bidirectional threshold weighted mapping CNN‐RNN (MLBTWCR) for classifying drug reviews based on users opinions. The Aspect based Modular Lexicon generation using the Advanced Dragon Fly Algorithm (AMLDA) is used to generate the score values for the lexicon and labels based on aspect. The Bidirectional Dropout Long and Short‐Term Memory (Bi‐DLSTM) and Bidirectional Gated Recurrent Unit (Bi‐GRU) used for extracting long dependencies and for performing the sequence of arbitrary length in both backward and forward directions. The experimental results are evaluated using http://drugslib.com and http://drugs.com datasets. Based on evaluation result, the proposed MLBTWCR gives accuracy of 93.02%, recall of 88.72%, error rate of 11.2, false positive rate (FPR) of 11.3, false negative rate (FNR) of 13.6, running time of 15 s, and convergence speed of 0.2 and F‐measure of 92.64%. Hence, our method performs well for the drug reviews classification based on aspects.

show abstract

A clinical trial termination prediction model based on denoising autoencoder and deep survival regression

Qi,

Yang,

Zou

et al. 2024

Quant. Biol.

View full text Add to dashboard Cite

Effective clinical trials are necessary for understanding medical advances but early termination of trials can result in unnecessary waste of resources. Survival models can be used to predict survival probabilities in such trials. However, survival data from clinical trials are sparse, and DeepSurv cannot accurately capture their effective features, making the models weak in generalization and decreasing their prediction accuracy. In this paper, we propose a survival prediction model for clinical trial completion based on the combination of denoising autoencoder (DAE) and DeepSurv models. The DAE is used to obtain a robust representation of features by breaking the loop of raw features after autoencoder training, and then the robust features are provided to DeepSurv as input for training. The clinical trial dataset for training the model was obtained from the ClinicalTrials.gov dataset. A study of clinical trial completion in pregnant women was conducted in response to the fact that many current clinical trials exclude pregnant women. The experimental results showed that the denoising autoencoder and deep survival regression (DAE‐DSR) model was able to extract meaningful and robust features for survival analysis; the C‐index of the training and test datasets were 0.74 and 0.75 respectively. Compared with the Cox proportional hazards model and DeepSurv model, the survival analysis curves obtained by using DAE‐DSR model had more prominent features, and the model was more robust and performed better in actual prediction.

show abstract

Latent Dirichlet Allocation in predicting clinical trial terminations

Cited by 15 publications

References 11 publications

Predictive modeling of clinical trial terminations using feature engineering and embedding learning

Predictive modeling of clinical trial terminations using feature engineering and embedding learning

Drug review sentimental analysis based on modular lexicon generation and a fusion of bidirectional threshold weighted mapping CNN‐RNN

A clinical trial termination prediction model based on denoising autoencoder and deep survival regression

Contact Info

Product

Resources

About