Recent rapid increase in the generation of clinical data and rapid development of computational science make us able to extract new insights from massive datasets in healthcare industry. Oncological clinical notes are creating rich databases for documenting patient's history and they potentially contain lots of patterns that could help in better management of the disease. However, these patterns are locked within free text (unstructured) portions of clinical documents and consequence in limiting health professionals to extract useful information from them and to finally perform Query and Answering (Q&A) process in an accurate way. The Information Extraction (IE) process requires Natural Language Processing (NLP) techniques to assign semantics to these patterns. Therefore, in this paper, we analyze the design of annotators for specific lung cancer concepts that can be integrated over Apache Unstructured Information Management Architecture (UIMA) framework. In addition, we explain the details of generation and storage of annotation outcomes.
The widespread adoption of Electronic Health Records (EHRs) is generating an ever-increasing amount of unstructured clinical texts. Processing time expressions from these domain-specific-texts is crucial for the discovery of patterns that can help in the detection of medical events and building the patient's natural history. In medical domain, the recognition of time information from texts is challenging due to their lack of structure; usage of various formats, styles and abbreviations; their domain specific nature; writing quality; and the presence of ambiguous expressions. Furthermore, despite of Spanish occupying the second position in the world ranking of number of speakers, to the best of our knowledge, no Natural Language Processing (NLP) tools have been introduced for the recognition of time expressions from clinical texts, written in this particular language. Therefore, in this paper we propose a Temporal Tagger for identifying and normalizing time expressions appeared in Spanish clinical texts. We further compare our Temporal Tagger with the Spanish version of SUTime. By using a large dataset comprising EHRs of people suffering from lung cancer, we show that our developed Temporal Tagger, with an F1 score of 0.93, outperforms SUTime, with an F1 score of 0.797.
The automatic reconstruction of the patient's treatment lines from their Electronic Health Records (EHRs) is a significant step towards improving the quality and the safety of the healthcare deliveries. With the recent rapid increase in the adaption of EHRs and the rapid development of computational science, we can discover new insights from the information stored in EHRs. However, this is still a challenging task, being unstructured data analysis one of them. In this paper, we focus on the most common challenges for reconstructing the patient's treatment lines, which are the Named Entity Recognition (NER), temporal relation identification and the integration of structured results. We introduce our Natural Language Processing (NLP) framework, which deals with the aforementioned challenges. In addition, we focus on a real use case of patients, suffering from lung cancer to extract patterns associated with the treatment of the disease that can help clinicians to analyze toxicities and patterns depending on the lines of treatments given to the patient.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.