2022
DOI: 10.1101/2022.08.10.503489
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Automatic identification of drug-induced liver injury literature using natural language processing and machine learning methods

Abstract: Drug-induced liver injury (DILI) is an adverse hepatic drug reaction that can potentially lead to life-threatening liver failure. Previously published work in the scientific literature on DILI has provided valuable insights for the understanding of hepatotoxicity as well as drug development. However, the manual search of scientific literature in PubMed is laborious. Natural language processing (NLP) techniques have been developed to decipher and understand the meaning of human language by extracting useful inf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

1
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 3 publications
1
2
0
Order By: Relevance
“…They proposed that certain features made the classification strongly biased, allowing simpler approaches to more efficiently classify the different disease states. Similarly, Oh et al observed that TF-IDF, word2vec, and a combined hybrid approach all outperformed BERT in identifying literature involving drug-induced liver injury [22]. These results, as well as ours, suggest that BERT and similar embedding methods, as is, may not be the best candidates for clinical data, although they have dominated almost every other domain they have been applied to.…”
Section: Discussionsupporting
confidence: 76%
See 2 more Smart Citations
“…They proposed that certain features made the classification strongly biased, allowing simpler approaches to more efficiently classify the different disease states. Similarly, Oh et al observed that TF-IDF, word2vec, and a combined hybrid approach all outperformed BERT in identifying literature involving drug-induced liver injury [22]. These results, as well as ours, suggest that BERT and similar embedding methods, as is, may not be the best candidates for clinical data, although they have dominated almost every other domain they have been applied to.…”
Section: Discussionsupporting
confidence: 76%
“…When validating these models on in-sample notes from 2000-2020, the ensemble method and SE-K method consistently outperformed BERT and SE-E. The fact that the ensemble method outperformed the other three models is not surprising, as other groups have found that combining different approaches leads to higher precision, recall, and accuracy than the individual methods alone [22,23]. What is surprising, however, is the observation that SE-K, a model based on TF-IDF, consistently outperformed the more resource-intensive state-of-the-art BERT in extracting information from clinical notes, additionally in many cases had similar results to the majority vote which is usually the best performing model.…”
Section: Discussionmentioning
confidence: 80%
See 1 more Smart Citation