Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment

Banerjee, Imon; Li, Kevin; Seneviratne, Martin; Ferrari, Michelle; Seto, Tina; Brooks, James D.; Rubin, Daniel L.; Hernandez‐Boussard, Tina

doi:10.1093/jamiaopen/ooy057

Cited by 39 publications

(39 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, CNN model was less successful at identifying the "severe" category compared to the mild and moderate Both rule-based and machine learning NLP approaches to leverage granular data in EHRs are common and their accuracy has been demonstrated in many recent studies. 5,6,22 In our study, the rulebased approach depends on human expertise outperformed the machine learning approach on UI severity extraction task. As reported in other studies, 9,22,23,[25][26][27] the underlying reason for this may be that the hand-designed rules that precisely capture specific patterns overfit with the data.…”

Section: F I G U R Ementioning

confidence: 68%

“…A window size of 8 was chosen since these hyperparameters have been shown to work well in other studies utilizing word vectors. 6,22 The model was trained for 10 epochs using the Gensim python library implementation of word2vec. 23 The trained vectors have the desirable property of assigning semantically similar words close together in Euclidean space.…”

Section: Cnn Methodsmentioning

confidence: 99%

“…As a second cohort filtering, we used a previously described machine learning method to categorize patients into affirmed and negated classification of UI. 6 The present study used only those notes that were categorized as affirmed UI (0.90 F1 score) by this algorithm to reduce computation time. This method yielded a total of 19 213 notes of 3612 patients with predicted affirmed UI (Figure 1).…”

Section: Study Cohortmentioning

confidence: 99%

See 2 more Smart Citations

Phenotyping severity of patient‐centered outcomes using clinical notes: A prostate cancer use case

Bozkurt

Paul

Coquet

et al. 2020

Learning Health Systems

Self Cite

View full text Add to dashboard Cite

Introduction: A learning health system (LHS) must improve care in ways that are meaningful to patients, integrating patient-centered outcomes (PCOs) into core infrastructure. PCOs are common following cancer treatment, such as urinary incontinence (UI) following prostatectomy. However, PCOs are not systematically recorded because they can only be described by the patient, are subjective and captured as unstructured text in the electronic health record (EHR). Therefore, PCOs pose significant challenges for phenotyping patients. Here, we present a natural language processing (NLP) approach for phenotyping patients with UI to classify their disease into severity subtypes, which can increase opportunities to provide precision-based therapy and promote a value-based delivery system. Methods: Patients undergoing prostate cancer treatment from 2008 to 2018 were identified at an academic medical center. Using a hybrid NLP pipeline that combines rule-based and deep learning methodologies, we classified positive UI cases as mild, moderate, and severe by mining clinical notes. Results: The rule-based model accurately classified UI into disease severity categories (accuracy: 0.86), which outperformed the deep learning model (accuracy: 0.73). In the deep learning model, the recall rates for mild and moderate group were higher than the precision rate (0.78 and 0.79, respectively). A hybrid model that combined both methods did not improve the accuracy of the rule-based model but did outperform the deep learning model (accuracy: 0.75). Conclusion: Phenotyping patients based on indication and severity of PCOs is essential to advance a patient centered LHS. EHRs contain valuable information on PCOs Selen Bozkurt and Rohan Paul shared first authorship.

show abstract

Section: F I G U R Ementioning

confidence: 68%

Section: Cnn Methodsmentioning

confidence: 99%

Section: Study Cohortmentioning

confidence: 99%

See 1 more Smart Citation

Phenotyping severity of patient‐centered outcomes using clinical notes: A prostate cancer use case

Bozkurt

Paul

Coquet

et al. 2020

Learning Health Systems

Self Cite

View full text Add to dashboard Cite

show abstract

“…Regardless of performance, the model will understate the true prevalence of UI if either clinicians or patients do not report all symptoms, or if the manner in which UI is documented is not captured by the design of the pipeline. 13 Second, this study was performed at a single institution, which limits the generalizability and the power of this analysis. Future studies in other healthcare settings and institutions could allow us to assess the reproducibility of NLP-derived findings given potential variability in physician documentation and patient population.…”

Section: Discussionmentioning

confidence: 99%

“…We assessed UI for each patient using an NLP pipeline that annotates EHR free-text notes as previously reported. 13 This open-source pipeline cleans EHR free-text notes and extracts sentences containing at least one of 61 unique terms indicative of urinary incontinence, a dictionary that was created by a group of urology professionals at our institution (Supplement Table 2). The pipeline does not require manually labeled text for training the NLP model.…”

Section: Electronic Health Record Processingmentioning

confidence: 99%

<p>Clinical Documentation to Predict Factors Associated with Urinary Incontinence Following Prostatectomy for Prostate Cancer</p>

Banerjee

Magnani

et al. 2020

RRU

Self Cite

View full text Add to dashboard Cite

Background: Advances in data collection provide opportunities to use population samples in identifying risk factors for urinary incontinence (UI), which occurs in up to 71% of men with prostate cancer following prostatectomy. Most studies on patient-centered outcomes use surveys or manual chart abstraction for data collection, which can be costly and difficult to scale. We sought to evaluate rates of and risk factors for UI following prostatectomy using natural language processing on electronic health record (EHR) data. Methods: We conducted a retrospective analysis of patients undergoing prostatectomy for prostate cancer between January 2008 and August 2018 using EHR data from an academic medical center. UI incidence for each patient in the cohort was assessed using natural language processing from clinical notes generated pre-and postoperatively. Multivariable logistic regression was used to evaluate potential risk factors for postoperative UI at various time points within 2 years following surgery. Results: We identified 3792 patients who underwent prostatectomy for prostate cancer. We found a significant association between preoperative UI and UI in the first (odds ratio [OR], 2.30; 95% confidence interval [CI], 1.24-4.28) and second (OR 2.24, 95% CI 1.04-4.83) years following surgery. Preoperative body mass index was also associated with UI in the second postoperative year (OR 1.11, 95% CI 1.02-1.21). Conclusion: We show that a natural language processing approach using clinical narratives can be used to assess risk for UI in prostate cancer patients. Unstructured clinical narrative text can help advance future population-level research in patient-centered outcomes and quality of care.

show abstract

Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods

Chng

Tern

Kan³

et al. 2023

Health Care Science

View full text Add to dashboard Cite

Automated labelling of radiology reports using natural language processing allows for the labelling of ground truth for large datasets of radiological studies that are required for training of computer vision models. This paper explains the necessary data preprocessing steps, reviews the main methods for automated labelling and compares their performance. There are four main methods of automated labelling, namely: (1) rules‐based text‐matching algorithms, (2) conventional machine learning models, (3) neural network models and (4) Bidirectional Encoder Representations from Transformers (BERT) models. Rules‐based labellers perform a brute force search against manually curated keywords and are able to achieve high F1 scores. However, they require proper handling of negative words. Machine learning models require preprocessing that involves tokenization and vectorization of text into numerical vectors. Multilabel classification approaches are required in labelling radiology reports and conventional models can achieve good performance if they have large enough training sets. Deep learning models make use of connected neural networks, often a long short‐term memory network, and are similarly able to achieve good performance if trained on a large data set. BERT is a transformer‐based model that utilizes attention. Pretrained BERT models only require fine‐tuning with small data sets. In particular, domain‐specific BERT models can achieve superior performance compared with the other methods for automated labelling.

show abstract

Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment

Cited by 39 publications

References 33 publications

Phenotyping severity of patient‐centered outcomes using clinical notes: A prostate cancer use case

Phenotyping severity of patient‐centered outcomes using clinical notes: A prostate cancer use case

<p>Clinical Documentation to Predict Factors Associated with Urinary Incontinence Following Prostatectomy for Prostate Cancer</p>

Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods

Contact Info

Product

Resources

About