Lucia Yin scite author profile

A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify completely. This problem could be tackled by using artificial medical data. In this work, we present an approach to generate artificial clinical documents. We apply this approach to discharge summaries from a large mental healthcare provider and discharge summaries from an intensive care unit. We perform an extensive intrinsic evaluation where we (1) apply several measures of text preservation; (2) measure how much the model memorises training data; and (3) estimate clinical validity of the generated text based on a human evaluation task. Furthermore, we perform an extrinsic evaluation by studying the impact of using artificial text in a downstream NLP text classification task. We found that using this artificial data as training data can lead to classification results that are comparable to the original results. Additionally, using only a small amount of information from the original data to condition the generation of the artificial data is successful, which holds promise for reducing the risk of these artificial data retaining rare information from the original data. This is an important finding for our long-term goal of being able to generate artificial clinical data that can be released to the wider research community and accelerate advances in developing computational methods that use healthcare data.

show abstract

A natural language processing approach for identifying temporal disease onset information from mental healthcare text

Viani

Botelle

Kerwin

et al. 2021

Sci Rep

View full text Add to dashboard Cite

Receiving timely and appropriate treatment is crucial for better health outcomes, and research on the contribution of specific variables is essential. In the mental health domain, an important research variable is the date of psychosis symptom onset, as longer delays in treatment are associated with worse intervention outcomes. The growing adoption of electronic health records (EHRs) within mental health services provides an invaluable opportunity to study this problem at scale retrospectively. However, disease onset information is often only available in open text fields, requiring natural language processing (NLP) techniques for automated analyses. Since this variable can be documented at different points during a patient’s care, NLP methods that model clinical and temporal associations are needed. We address the identification of psychosis onset by: 1) manually annotating a corpus of mental health EHRs with disease onset mentions, 2) modelling the underlying NLP problem as a paragraph classification approach, and 3) combining multiple onset paragraphs at the patient level to generate a ranked list of likely disease onset dates. For 22/31 test patients (71%) the correct onset date was found among the top-3 NLP predictions. The proposed approach was also applied at scale, allowing an onset date to be estimated for 2483 patients.

show abstract

Time Expressions in Mental Health Records for Symptom Onset Extraction

Viani

Yin²,

Kam³

et al. 2018

View full text Add to dashboard Cite

For psychiatric disorders such as schizophrenia, longer durations of untreated psychosis are associated with worse intervention outcomes. Data included in electronic health records (EHRs) can be useful for retrospective clinical studies, but much of this is stored as unstructured text which cannot be directly used in computation. Natural Language Processing (NLP) methods can be used to extract this data, in order to identify symptoms and treatments from mental health records, and temporally anchor the first emergence of these. We are developing an EHR corpus annotated with time expressions, clinical entities and their relations, to be used for NLP development. In this study, we focus on the first step, identifying time expressions in EHRs for patients with schizophrenia. We developed a gold standard corpus, compared this corpus to other related corpora in terms of content and time expression prevalence, and adapted two NLP systems for extracting time expressions. To the best of our knowledge, this is the first resource annotated for temporal entities in the mental health domain.

show abstract

Temporal information extraction from mental health records to identify duration of untreated psychosis

et al. 2020

View full text Add to dashboard Cite

Background Duration of untreated psychosis (DUP) is an important clinical construct in the field of mental health, as longer DUP can be associated with worse intervention outcomes. DUP estimation requires knowledge about when psychosis symptoms first started (symptom onset), and when psychosis treatment was initiated. Electronic health records (EHRs) represent a useful resource for retrospective clinical studies on DUP, but the core information underlying this construct is most likely to lie in free text, meaning it is not readily available for clinical research. Natural Language Processing (NLP) is a means to addressing this problem by automatically extracting relevant information in a structured form. As a first step, it is important to identify appropriate documents, i.e., those that are likely to include the information of interest. Next, temporal information extraction methods are needed to identify time references for early psychosis symptoms. This NLP challenge requires solving three different tasks: time expression extraction, symptom extraction, and temporal “linking”. In this study, we focus on the first step, using two relevant EHR datasets. Results We applied a rule-based NLP system for time expression extraction that we had previously adapted to a corpus of mental health EHRs from patients with a diagnosis of schizophrenia (first referrals). We extended this work by applying this NLP system to a larger set of documents and patients, to identify additional texts that would be relevant for our long-term goal, and developed a new corpus from a subset of these new texts (early intervention services). Furthermore, we added normalized value annotations (“2011–05”) to the annotated time expressions (“May 2011”) in both corpora. The finalized corpora were used for further NLP development and evaluation, with promising results (normalization accuracy 71–86%). To highlight the specificities of our annotation task, we also applied the final adapted NLP system to a different temporally annotated clinical corpus. Conclusions Developing domain-specific methods is crucial to address complex NLP tasks such as symptom onset extraction and retrospective calculation of duration of a preclinical syndrome. To the best of our knowledge, this is the first clinical text resource annotated for temporal entities in the mental health domain.

show abstract

Implementing Mycoplasma genitalium testing across a London-based sexual health service: A quality improvement project

et al. 2020

View full text Add to dashboard Cite

Recent national guidelines recommended testing for Mycoplasma genitalium (MG) in clinically-indicated conditions (CIC) including non-gonococcal urethritis (NGU), pelvic inflammatory disease (PID) and epididymo-orchitis. Over five months in 2018 a quality improvement project (QIP) was carried out across three London sexual health clinics with the aim of increasing MG testing rates in CICs. Three Plan-Do-Study-Act (PDSA) cycles were completed: improving IT access, an education event and reminder emails for clinicians who did not test in CIC. To measure testing rates ten patients from each CIC were randomly selected each week and MG testing outcomes were collected. As a balancing measure, we identified the rate of inappropriate MG testing. MG testing rates in patients with NGU increased to 90% following QIP initiation (baseline rate 60%) and this increase was sustained. No increase in MG testing was seen in PID and epididymo-orchitis. Inappropriate MG test rates were high (median of 11%) but remained constant throughout the QIP period. As MG testing is expanding across the UK, we outline a QIP integrating MG testing into a busy multi-site, sexual health service improving testing uptake while not increasing inappropriate testing.

show abstract

114 Leading a quality improvement project across a London-based sexual health service

Alam

Yin

Khan

et al. 2019

View full text Add to dashboard Cite

period and 43% in the subsequent period. Length of stay prior to hot clinic was 3.63 days, 2.57 days in the pilot period and then went down further to 1.7 daysto 53% of baseline. Admission avoidance was 41% in pilot and 32% subsequently. The Urology Hot Clinic has had a significant impact on reducing ED attendance and length of stay of urology patients in our hospital, and on admission avoidance. It has streamlined the patient pathway reducing burden on multiple departments and patients. The success of the hot clinic at our hospital could serve as an example for other urology departments and potentially other specialties.

show abstract

An audit on the diagnosis of primary CNS lymphoma

Joe

Yin

Kassam

et al. 2021

View full text Add to dashboard Cite

Aims Primary central nervous system lymphoma (PCNSL) is a rare form of non–Hodgkin lymphoma with exclusive manifestations in the central nervous system, leptomeninges and eyes. It forms around 5% of all primary brain tumours. It is an aggressive tumour which has a poor prognosis if left untreated. It is imperative that diagnosis is made timely so treatment can be started promptly. Therefore, we performed an audit looking into the speed of diagnostic process of PCNSL in our tertiary Neuro–oncology Unit. Method Single-centre retrospective review of PCNSL cases referred to a tertiary Neuro–Oncology Unit over a six month period from June to November 2020. Results A total of 1309 cases were discussed in the Neuro–oncology MDT meeting over the study period. Fourteen cases (6 male, 8 female; median age [range] 66 [59–83] years) were identified as highly likely PCNSL. Neuroimaging suggested PCNSL as the likely diagnosis in twelve patients. Twelve patients were started on steroids after CT or MRI brain scans. Nine patients had a surgical target and proceeded to have diagnostic brain biopsy. Two patients had different working diagnoses and three patients were deemed unsuitable for brain surgery. One patient required repeat brain biopsy. A tissue diagnosis was made in twelve patients. One patient deteriorated rapidly and one patient had a brain lesion that was deemed too high risk for surgery. The median time between neuroimaging and biopsy was 25 days. The median time taken from first investigation to the pathological confirmation of PCNSL was 36 days (range 6–86 days). Conclusion The chief reason for delay in diagnosis of PCNSL was that patients were started on steroids before diagnostic investigations were completed. Steroids caused the brain lesions to become smaller or disappear. Accordingly, time was needed to allow withdrawal of steroids before diagnostic investigations could be repeated. Diagnostic delays may have been exacerbated by logistical issues associated with COVID–19. We propose that there needs to be greater awareness of how early introduction of steroids can markedly delay the diagnosis of PCNSL.

show abstract

PTU-4 Prospective validation of the edinburgh dysphagia score in South-east London

Harley

Yin

Amanuel

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lucia Yin

Generation and evaluation of artificial mental health records for Natural Language Processing

A natural language processing approach for identifying temporal disease onset information from mental healthcare text

Time Expressions in Mental Health Records for Symptom Onset Extraction

Temporal information extraction from mental health records to identify duration of untreated psychosis

Implementing Mycoplasma genitalium testing across a London-based sexual health service: A quality improvement project

114 Leading a quality improvement project across a London-based sexual health service

An audit on the diagnosis of primary CNS lymphoma

PTU-4 Prospective validation of the edinburgh dysphagia score in South-east London

Contact Info

Product

Resources

About