Lance De Vine scite author profile

Advances in neural network language models have demonstrated that these models can effectively learn representations of words meaning. In this paper, we explore a variation of neural language models that can learn on concepts taken from structured ontologies and extracted from freetext, rather than directly from terms in free-text.This model is employed for the task of measuring semantic similarity between medical concepts, a task that is central to a number of techniques in medical informatics and information retrieval. The model is built with two medical corpora (journal abstracts and patient records) and empirically validated on two ground-truth datasets of human-judged concept pairs assessed by medical professionals. Empirically, our approach correlates closely with expert human assessors (≈ 0.9) and outperforms a number of state-of-the-art benchmarks for medical semantic similarity.The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).

show abstract

Many Paths Lead to Discovery: Analogical Retrieval of Cancer Therapies

Cohen

Widdows

Vine

et al. 2012

View full text Add to dashboard Cite

Abstract. This paper addresses the issue of analogical inference, and its potential role as the mediator of new therapeutic discoveries, by using disjunction operators based on quantum connectives to combine many potential reasoning pathways into a single search expression. In it, we extend our previous work in which we developed an approach to analogical retrieval using the Predication-based Semantic Indexing (PSI) model, which encodes both concepts and the relationships between them in high-dimensional vector space. As in our previous work, we leverage the ability of PSI to infer predicate pathways connecting two example concepts, in this case comprising of known therapeutic relationships. For example, given that drug x TREATS disease z, we might infer the predicate pathway drug x INTERACTS WITH gene y ASSOCIATED WITH disease z, and use this pathway to search for drugs related to another disease in similar ways. As biological systems tend to be characterized by networks of relationships, we evaluate the ability of quantum-inspired operators to mediate inference and retrieval across multiple relations, by testing the ability of different approaches to recover known therapeutic relationships. In addition, we introduce a novel complex vector based implementation of PSI, based on Plate's Circular Holographic Reduced Representations, which we utilize for all experiments in addition to the binary vector based approach we have applied in our previous research.

show abstract

Clinical information extraction using small data: An active learning approach based on sequence representations and word embeddings

Kholghi

Vine

Sitbon

et al. 2017

Asso for Info Science & Tech

View full text Add to dashboard Cite

This article demonstrates the benefits of using sequence representations based on word embeddings to inform the seed selection and sample selection processes in an active learning pipeline for clinical information extraction. Seed selection refers to choosing an initial sample set to label to form an initial learning model. Sample selection refers to selecting informative samples to update the model at each iteration of the active learning process. Compared to supervised machine learning approaches, active learning offers the opportunity to build statistical classifiers with a reduced amount of training samples that require manual annotation. Reducing the manual annotation effort can support automating the clinical information extraction process. This is particularly beneficial in the clinical domain, where manual annotation is a time-consuming and costly task, as it requires extensive labor from clinical experts. Our empirical findings demonstrate that (a) using sequence representations along with the length of sequence for seed selection shows potential towards more effective initial models, and (b) using sequence representations for sample selection leads to significantly lower manual annotation efforts, with up to 3% and 6% fewer tokens and concepts requiring annotation, respectively, compared to state-of-the-art query strategies.

show abstract

A Study on the Use of Word Embeddings and PageRank for Vietnamese Text Summarization

Phung

Vine

2015

View full text Add to dashboard Cite

Quantifying the genericness of trademarks using natural language processing: an introduction with suggested metrics

Shackell

Vine

2021

Artif Intell Law

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lance De Vine

Medical Semantic Similarity with a Neural Language Model

Many Paths Lead to Discovery: Analogical Retrieval of Cancer Therapies

Clinical information extraction using small data: An active learning approach based on sequence representations and word embeddings

A Study on the Use of Word Embeddings and PageRank for Vietnamese Text Summarization

Quantifying the genericness of trademarks using natural language processing: an introduction with suggested metrics

Contact Info

Product

Resources

About