Sentiment analysis is one of the hottest topics in the area of natural language. It has attracted a huge interest from both the scientific and industrial perspective. Identifying the sentiment expressed in a piece of textual information is a challenging task that several commercial tools have tried to address. In our aim of capturing the sentiment expressed in a set of tweets retrieved for a study about vaccines and diseases during the period 2015–2018, we found that some of the main commercial tools did not allow an accurate identification of the sentiment expressed in a tweet. For this reason, we aimed to create a meta-model which used the results of the commercial tools to improve the results of the tools individually. As part of this research, we had to deal with the problem of unbalanced data. This paper presents the main results in creating a metal-model from three commercial tools to the correct identification of sentiment in tweets by using different machine-learning techniques and methods and dealing with the unbalanced data problem.
Owing to the complexity of the human body, most diseases present a high interpersonal variability in the way they manifest, i.e. in their phenotype, which has important clinical repercussions-for instance, the difficulty in defining objective diagnostic rules. Here we explore the hypothesis that signs and symptoms used to define a disease should be understood in terms of the dispersion (as opposed to the average) of physical observables. To that end, we propose a computational framework, based on complex networks theory, to map groups of subjects to a network structure, based on their pairwise phenotypical similarity. We demonstrate that the resulting structure can be used to improve the performance of classification algorithms, especially in the case of a limited number of instances, with both synthetic and real datasets. Beyond providing an alternative conceptual understanding of diseases, the proposed framework could be of special relevance in the growing field of personalized, or -to-1, medicine.
Recent rapid increase in the generation of clinical data and rapid development of computational science make us able to extract new insights from massive datasets in healthcare industry. Oncological clinical notes are creating rich databases for documenting patient's history and they potentially contain lots of patterns that could help in better management of the disease. However, these patterns are locked within free text (unstructured) portions of clinical documents and consequence in limiting health professionals to extract useful information from them and to finally perform Query and Answering (Q&A) process in an accurate way. The Information Extraction (IE) process requires Natural Language Processing (NLP) techniques to assign semantics to these patterns. Therefore, in this paper, we analyze the design of annotators for specific lung cancer concepts that can be integrated over Apache Unstructured Information Management Architecture (UIMA) framework. In addition, we explain the details of generation and storage of annotation outcomes.
If Electronic Health Records contain a large amount of information about the patient's condition and response to treatment, which can potentially revolutionize the clinical practice, such information is seldom considered due to the complexity of its extraction and analysis. We here report on a first integration of an NLP framework for the analysis of clinical records of lung cancer patients making use of a telephone assistance service of a major Spanish hospital. We specifically show how some relevant data, about patient demographics and health condition, can be extracted; and how some relevant analyses can be performed, aimed at improving the usefulness of the service. We thus demonstrate that the use of EHR texts, and their integration inside a data analysis framework, is technically feasible and worth of further study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.