A general supervised approach to segmentation of clinical texts

Ganesan, Kavita; Subotin, Michael

doi:10.1109/bigdata.2014.7004390

Cited by 16 publications

(17 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, seven hybrid approaches use rule-based methods during the creation of training and test data sets, and then apply ML methods. This is the case of Apostolova et al [1], Sadoughi et al [46], Ni et al [40], Chen et al [5], Dai et al [7], Jancsary et al [20], and Ganesan and Subotin [16]. Other ones use rules for detecting the explicit sections and a ML algorithm for detecting implicit sections like dine in Cho et al [6].…”

Section: Resultsmentioning

confidence: 99%

Current approaches to identify sections within clinical narratives from electronic health records: a systematic review

Pomares-Quimbaya

Kreuzthaler

Schulz

2019

BMC Med Res Methodol

View full text Add to dashboard Cite

Background The identification of sections in narrative content of Electronic Health Records (EHR) has demonstrated to improve the performance of clinical extraction tasks; however, there is not yet a shared understanding of the concept and its existing methods. The objective is to report the results of a systematic review concerning approaches aimed at identifying sections in narrative content of EHR, using both automatic or semi-automatic methods. Methods This review includes articles from the databases: SCOPUS, Web of Science and PubMed (from January 1994 to September 2018). The selection of studies was done using predefined eligibility criteria and applying the PRISMA recommendations. Search criteria were elaborated by using an iterative and collaborative keyword enrichment. Results Following the eligibility criteria, 39 studies were selected for analysis. The section identification approaches proposed by these studies vary greatly depending on the kind of narrative, the type of section, and the application. We observed that 57% of them proposed formal methods for identifying sections and 43% adapted a previously created method. Seventy-eight percent were intended for English texts and 41% for discharge summaries. Studies that are able to identify explicit (with headings) and implicit sections correspond to 46%. Regarding the level of granularity, 54% of the studies are able to identify sections, but not subsections. From the technical point of view, the methods can be classified into rule-based methods (59%), machine learning methods (22%) and a combination of both (19%). Hybrid methods showed better results than those relying on pure machine learning approaches, but lower than rule-based methods; however, their scope was more ambitious than the latter ones. Despite all the promising performance results, very few studies reported tests under a formal setup. Almost all the studies relied on custom dictionaries; however, they used them in conjunction with a controlled terminology, most commonly the UMLSⓇ metathesaurus. Conclusions Identification of sections in EHR narratives is gaining popularity for improving clinical extraction projects. This study enabled the community working on clinical NLP to gain a formal analysis of this task, including the most successful ways to perform it.

show abstract

Section: Resultsmentioning

confidence: 99%

Current approaches to identify sections within clinical narratives from electronic health records: a systematic review

Pomares-Quimbaya

Kreuzthaler

Schulz

2019

BMC Med Res Methodol

View full text Add to dashboard Cite

show abstract

“…By contrast, studies and resources related to the recognition of EHR sections are still very limited. Ganesan and Subotin [ 12 ] proposed L1-regularized logistic regression model that is capable of recognizing the header, footer, and all of the top-level sections of a clinical note. Tepper et al [ 17 ] showed that the two-step approach which first recognized the section headings followed by their categorization achieved a better performance than the one that combines the two tasks in one step.…”

Section: Discussionmentioning

confidence: 99%

“…In view of this issue, this paper compiled a section-heading recognition corpus on top of the dataset released by the i2b2 2014 shared task [ 9 ] and presents a machine learning approach based on the conditional random fields (CRF) model [ 10 ] to handle the section-heading recognition task for EHRs. Based on the assumption that the narratives following a recognized section heading should belong to this corresponding section, this work modeled the task as a sequential token labeling problem in a given text, which differs from most of the previous works [ 11 , 12 ] that formulated the problem as a sentence-by-sentence classification task. The compiled corpus along with the developed model and section-heading recognition tool is publicly available at https://www.sites.google.com/site/hongjiedai/projects/nttmuclinicalnet and http://btm.tmu.edu.tw/nttmuclinicalnet/ in an attempt to facilitate clinical research.…”

Section: Introductionmentioning

confidence: 99%

Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields

Dai

Syed-Abdul

Chen

et al. 2015

BioMed Research International

View full text Add to dashboard Cite

Electronic health record (EHR) is a digital data format that collects electronic health information about an individual patient or population. To enhance the meaningful use of EHRs, information extraction techniques have been developed to recognize clinical concepts mentioned in EHRs. Nevertheless, the clinical judgment of an EHR cannot be known solely based on the recognized concepts without considering its contextual information. In order to improve the readability and accessibility of EHRs, this work developed a section heading recognition system for clinical documents. In contrast to formulating the section heading recognition task as a sentence classification problem, this work proposed a token-based formulation with the conditional random field (CRF) model. A standard section heading recognition corpus was compiled by annotators with clinical experience to evaluate the performance and compare it with sentence classification and dictionary-based approaches. The results of the experiments showed that the proposed method achieved a satisfactory F-score of 0.942, which outperformed the sentence-based approach and the best dictionary-based system by 0.087 and 0.096, respectively. One important advantage of our formulation over the sentence-based approach is that it presented an integrated solution without the need to develop additional heuristics rules for isolating the headings from the surrounding section contents.

show abstract

“…Research may concentrate on section detection only (Ganesan and Subotin, 2014;Dai et al, 2015), section classification (with section boundaries assumed to be known) (Li et al, 2010;Haug et al, 2014) or both (Apostolova et al, 2009;Denny et al, 2009;Tepper et al, 2012). In this paper we focus on section-level classification and section classification at the sentence level.…”

Section: Related Workmentioning

confidence: 99%

“…In this paper we focus on section-level classification and section classification at the sentence level. Prior approaches to section prediction include Support Vector Machines leveraging features computed by bi-gram tf-idf vector representations (Apostolova et al, 2009), Hidden Markov Models (HMM) with sections regarded as part of a sequence (Li et al, 2010), Maximum Entropy Classifiers (Tepper et al, 2012), 1-Regularized Logistic Regression (Ganesan and Subotin, 2014), Bayesian models using N-gram features (Haug et al, 2014), and linear-chain Conditional Random Fields (CRF) to determine section headers (Dai et al, 2015). Most of these approaches rely heavily on hand-crafted features that are time consuming to develop and may not easily generalize across EHRs from different sources.…”

Section: Related Workmentioning

confidence: 99%

Leveraging Medical Literature for Section Prediction in Electronic Health Records

Rosenthal¹,

Barker²,

Liang³

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Electronic Health Records (EHRs) contain both structured content and unstructured (text) content about a patient's medical history. In the unstructured text parts, there are common sections such as Assessment and Plan, Social History, and Medications. These sections help physicians find information easily and can be used by an information retrieval system to return specific information sought by a user. However, it is common that the exact format of sections in a particular EHR does not adhere to known patterns. Therefore, being able to predict sections and headers in EHRs automatically is beneficial to physicians. Prior approaches in EHR section prediction have only used text data from EHRs and have required significant manual annotation. We propose using sections from medical literature (e.g., textbooks, journals, web content) that contain content similar to that found in EHR sections. Our approach uses data from a different kind of source where labels are provided without the need of a timeconsuming annotation effort. We use this data to train two models: an RNN and a BERTbased model. We apply the learned models along with source data via transfer learning to predict sections in EHRs. Our results show that medical literature can provide helpful supervision signal for this classification task.

show abstract

A general supervised approach to segmentation of clinical texts

Cited by 16 publications

References 9 publications

Current approaches to identify sections within clinical narratives from electronic health records: a systematic review

Current approaches to identify sections within clinical narratives from electronic health records: a systematic review

Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields

Leveraging Medical Literature for Section Prediction in Electronic Health Records

Contact Info

Product

Resources

About