We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies-the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text.
Information in electronic medical records is often in an unstructured free-text format. This format presents challenges for expedient data retrieval and may fail to convey important findings. Natural language processing (NLP) is an emerging technique for rapid and efficient clinical data retrieval. While proven in disease detection, the utility of NLP in discerning disease progression from free-text reports is untested. We aimed to (1) assess whether unstructured radiology reports contained sufficient information for tumor status classification; (2) develop an NLP-based data extraction tool to determine tumor status from unstructured reports; and (3) compare NLP and human tumor status classification outcomes. Consecutive follow-up brain tumor magnetic resonance imaging reports (2000–2007) from a tertiary center were manually annotated using consensus guidelines on tumor status. Reports were randomized to NLP training (70%) or testing (30%) groups. The NLP tool utilized a support vector machines model with statistical and rule-based outcomes. Most reports had sufficient information for tumor status classification, although 0.8% did not describe status despite reference to prior examinations. Tumor size was unreported in 68.7% of documents, while 50.3% lacked data on change magnitude when there was detectable progression or regression. Using retrospective human classification as the gold standard, NLP achieved 80.6% sensitivity and 91.6% specificity for tumor status determination (mean positive predictive value, 82.4%; negative predictive value, 92.0%). In conclusion, most reports contained sufficient information for tumor status determination, though variable features were used to describe status. NLP demonstrated good accuracy for tumor status classification and may have novel application for automated disease status classification from electronic databases.
Although joint inference is an effective approach to avoid cascading of errors when inferring multiple natural language tasks, its application to information extraction has been limited to modeling only two tasks at a time, leading to modest improvements. In this paper, we focus on the three crucial tasks of automated extraction pipelines: entity tagging, relation extraction, and coreference. We propose a single, joint graphical model that represents the various dependencies between the tasks, allowing flow of uncertainty across task boundaries. Since the resulting model has a high tree-width and contains a large number of variables, we present a novel extension to belief propagation that sparsifies the domains of variables during inference. Experimental results show that our joint model consistently improves results on all three tasks as we represent more dependencies. In particular, our joint model obtains 12% error reduction on tagging over the isolated models.
ObjectiveTo assess the efficacy and safety of transarterial Chemoembolization (TACE) combined with lenvatinib plus sintilimab in unresectable hepatocellular carcinoma (HCC).Patients and MethodsThe data of patients with unresectable HCC administered a combination therapy with TACE and lenvatinib plus sintilimab were retrospectively assessed. Patients received lenvatinib orally once daily 2 weeks before TACE, followed by sintilimab administration at 200 mg intravenously on day 1 of a 21-day therapeutic cycle after TACE. The primary endpoints were objective response rate (ORR) and duration of response (DOR) by the modified RECIST criteria.ResultsMedian duration of follow-up was 12.5 months (95%CI 9.1 to 14.8 months). ORR was 46.7% (28/60). Median DOR in confirmed responders was 10.0 months (95%CI 9.0-11.0 months). Median progression-free survival (PFS) was 13.3 months (95%CI 11.9-14.7 months). Median overall survival (OS) was 23.6 months (95%CI 22.2-25.0 months).ConclusionsTACE combined with lenvatinib plus sintilimab is a promising therapeutic regimen in unresectable hepatocellular carcinoma.
Objective
Astrocytes actively participate in energy metabolism in the brain, and astrocytic aerobic glycolysis disorder is associated with the pathology of Alzheimer's disease (AD). GLP-1 has been shown to improve cognition in AD; however, the mechanism remains unclear. The objectives of this study were to assess GLP-1's glycolytic regulation effects in AD and reveal its neuroprotective mechanisms.
Methods
The Morris water maze test was used to evaluate the effects of liraglutide (an analog of GLP-1) on the cognition of 4-month-old 5
FAD mice, and a proteomic analysis and Western blotting were used to assess the proteomic profile changes. We constructed an astrocytic model of AD by treating primary astrocytes with Aβ
1-42
. The levels of NAD+ and lactate were examined, and the oxidative levels were assessed by a Seahorse examination. Astrocyte-neuron co-culture was performed to evaluate the effects of GLP-1 on astrocytes’ neuronal support.
Results
GLP-1 improved cognition in 4-month-old 5
FAD mice by enhancing aerobic glycolysis and reducing oxidative phosphorylation (OXPHOS) levels and oxidative stress in the brain. GLP-1 also alleviated Aβ-induced glycolysis declines in astrocytes, which resulted in reduced OXPHOS levels and reactive oxygen species (ROS) production. The mechanism involved the activation of the PI3K/Akt pathway by GLP-1. Elevation in astrocytic glycolysis improved astrocyte cells’ support of neurons and promoted neuronal survival and axon growth.
Conclusions
Taken together, we revealed GLP-1's capacity to regulate astrocytic glycolysis, providing mechanistic insight into one of its neuroprotective roles in AD and support for the feasibility of energy regulation treatments for AD.
Coreference resolution is the task of determining linguistic expressions that refer to the same real-world entity in natural language. Research on coreference resolution in the general English domain dates back to 1960s and 1970s. However, research on coreference resolution in the clinical free text has not seen major development. The recent US government initiatives that promote the use of electronic health records (EHRs) provide opportunities to mine patient notes as more and more health care institutions adopt EHR. Our goal was to review recent advances in general purpose coreference resolution to lay the foundation for methodologies in the clinical domain, facilitated by the availability of a shared lexical resource of gold standard coreference annotations, the Ontology Development and Information Extraction (ODIE) corpus.
The first step toward the development of an anaphoric relation resolver as part of a comprehensive natural language processing system geared specifically for the clinical narrative in the electronic medical record is described. The deidentified annotated corpus will be available to researchers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.