Lucía Prieto Santamaría scite author profile

Sentiment analysis is one of the hottest topics in the area of natural language. It has attracted a huge interest from both the scientific and industrial perspective. Identifying the sentiment expressed in a piece of textual information is a challenging task that several commercial tools have tried to address. In our aim of capturing the sentiment expressed in a set of tweets retrieved for a study about vaccines and diseases during the period 2015–2018, we found that some of the main commercial tools did not allow an accurate identification of the sentiment expressed in a tweet. For this reason, we aimed to create a meta-model which used the results of the commercial tools to improve the results of the tools individually. As part of this research, we had to deal with the problem of unbalanced data. This paper presents the main results in creating a metal-model from three commercial tools to the correct identification of sentiment in tweets by using different machine-learning techniques and methods and dealing with the unbalanced data problem.

show abstract

A data-driven methodology towards evaluating the potential of drug repurposing hypotheses

Santamaría

Carro

Uzquiano

et al. 2021

Computational and Structural Biotechnology Journal

View full text Add to dashboard Cite

show abstract

DISNET: a framework for extracting phenotypic disease information from public sources

García

Rodríguez‐González

Santamaría

et al. 2020

View full text Add to dashboard Cite

Background Within the global endeavour of improving population health, one major challenge is the identification and integration of medical knowledge spread through several information sources. The creation of a comprehensive dataset of diseases and their clinical manifestations based on information from public sources is an interesting approach that allows one not only to complement and merge medical knowledge but also to increase it and thereby to interconnect existing data and analyse and relate diseases to each other. In this paper, we present DISNET (http://disnet.ctb.upm.es/), a web-based system designed to periodically extract the knowledge from signs and symptoms retrieved from medical databases, and to enable the creation of customisable disease networks. Methods We here present the main features of the DISNET system. We describe how information on diseases and their phenotypic manifestations is extracted from Wikipedia and PubMed websites; specifically, texts from these sources are processed through a combination of text mining and natural language processing techniques. Results We further present the validation of our system on Wikipedia and PubMed texts, obtaining the relevant accuracy. The final output includes the creation of a comprehensive symptoms-disease dataset, shared (free access) through the system’s API. We finally describe, with some simple use cases, how a user can interact with it and extract information that could be used for subsequent analyses. Discussion DISNET allows retrieving knowledge about the signs, symptoms and diagnostic tests associated with a disease. It is not limited to a specific category (all the categories that the selected sources of information offer us) and clinical diagnosis terms. It further allows to track the evolution of those terms through time, being thus an opportunity to analyse and observe the progress of human knowledge on diseases. We further discussed the validation of the system, suggesting that it is good enough to be used to extract diseases and diagnostically-relevant terms. At the same time, the evaluation also revealed that improvements could be introduced to enhance the system’s reliability.

show abstract

Integrating heterogeneous data to facilitate COVID-19 drug repurposing

Santamaría

Uzquiano

Carro

et al. 2022

Drug Discovery Today

View full text Add to dashboard Cite

In the COVID-19 pandemic, drug repositioning has presented itself as an alternative to the time-consuming process of generating new drugs. This review describes a drug repurposing process that is based on a new data-driven approach: we put forward five information paths that associate COVID-19-related genes and COVID-19 symptoms with drugs that directly target these gene products, that target the symptoms or that treat diseases that are symptomatically or genetically similar to COVID-19. The intersection of the five information paths results in a list of 13 drugs that we suggest as potential candidates against COVID-19. In addition, we have found information in published studies and in clinical trials that support the therapeutic potential of the drugs in our final list.

show abstract

DISNET: A framework for extracting phenotypic disease information from public sources

García

Santamaría

Valle

et al. 2018

Preprint

View full text Add to dashboard Cite

Within the global endeavour of improving population health, one major challenge is the increasingly high cost associated with drug development. Drug repositioning, i.e. finding new uses for existing drugs, is a promising alternative; yet, its effectiveness has hitherto been hindered by our limited knowledge about diseases and their relationships. In this paper, we present DISNET (disnet.ctb.upm.es), a web-based system designed to extract knowledge from signs and symptoms retrieved from medical databases, and to enable the creation of customisable disease networks. We here present the main features of the DISNET system. We describe how information on diseases and their phenotypic manifestations is extracted from Wikipedia, PubMed and Mayo Clinic; specifically, texts from these sources are processed through a combination of text mining and natural language processing techniques. We further present a validation of the processing performed by the system; and describe, with some simple use cases, how a user can interact with it and extract information that could be used for subsequent analyses.

show abstract

DisMaNET: A network-based tool to cross map disease vocabularies

Valle

García

Santamaría

et al. 2021

Computer Methods and Programs in Biomedicine

View full text Add to dashboard Cite

Background and ObjectivesThe growing integration of healthcare sources is improving our understanding of diseases. Crossmapping resources such as UMLS play a very important role in this area, but their coverage is still incomplete. With the aim to facilitate the integration and interoperability of biological, clinical and literary sources in studies of diseases, we built DisMaNET, a system to cross-map terms from disease vocabularies by leveraging the power and intuitiveness of network analysis. MethodsFirst, we collected and normalized data from 8 disease vocabularies and mapping sources to generate our datasets. Next, we built DisMaNET by integrating the generated datasets into a Neo4j graph database. Then we exploited the query mechanisms of Neo4j to cross-map disease terms of different vocabularies with a relevance score metric and contrasted the results with some state-of-the-art solutions. Finally, we made our system publicly available for its exploitation and evaluation both through a graphical user interface and REST APIs. ResultsDisMaNET contains almost half a million nodes and near nine hundred thousand edges, including hierarchical and mapping relationships. Its query capabilities enabled the detection of connections between disease vocabularies that are not present in major mapping sources such as UMLS and the Disease Ontology, even for rare diseases. Furthermore, DisMaNET was capable of obtaining more than 80% of the mappings with UMLS reported in MonDO and DisGeNET. Our tool was used successfully to complete the missing mappings in DISNET, a web-based system designed to extract knowledge from signs and symptoms retrieved from medical databases. ConclusionsDisMaNET is a powerful, intuitive and publicly available system to cross-map terms from different disease vocabularies. Its completeness and the potential of network analysis make it a competitive alternative to existing mapping systems. Expansion with new sources, versioning and the improvement of the search and scoring algorithms are envisioned as future lines of work.

show abstract

Influenza and Measles-MMR: two case study of the trend and impact of vaccine-related Twitter posts in Spanish during 2015-2018

Santamaría

Tuñas

Peces-Barba

et al. 2021

Human Vaccines & Immunotherapeutics

View full text Add to dashboard Cite

Social media, and in particularly Twitter, can be a resource of enormous value to retrieve information about the opinion of general population to vaccines. The increasing popularity of this social media has allowed to use its content to have a clear picture of their users on this topic. In this paper, we perform a study about vaccine-related messages published in Spanish during 2015-2018. More specifically, the paper has focused on two specific diseases: influenza and measles (and MMR as its vaccine). By also including an analysis about the sentiment expressed on the published tweets, we have been able to identify the type of messages that are published on Twitter with respect these two pathologies and their vaccines. Results showed that in contrary on popular opinions, most of the messages published are non-negative. On the other hand, the analysis showed that some messages attracted a huge attention and provoked peaks in the number of published tweets, explaining some changes in the observed trends.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.