Michelle Vanni scite author profile

Mining entity synonym sets (i.e., sets of terms referring to the same entity) is an important task for many entity-leveraging applications. Previous work either rank terms based on their similarity to a given query term, or treats the problem as a two-phase task (i.e., detecting synonymy pairs, followed by organizing these pairs into synonym sets). However, these approaches fail to model the holistic semantics of a set and suffer from the error propagation issue. Here we propose a new framework, named SynSetMine, that efficiently generates entity synonym sets from a given vocabulary, using example sets from external knowledge bases as distant supervision. SynSetMine consists of two novel modules: (1) a set-instance classifier that jointly learns how to represent a permutation invariant synonym set and whether to include a new instance (i.e., a term) into the set, and (2) a set generation algorithm that enumerates the vocabulary only once and applies the learned set-instance classifier to detect all entity synonym sets in it. Experiments on three real datasets from different domains demonstrate both effectiveness and efficiency of SynSetMine for mining entity synonym sets.

show abstract

Exploiting Relevance Feedback in Knowledge Graph Search

Yang

Sun

et al. 2015

View full text Add to dashboard Cite

The big data era is witnessing a prevalent shift of data from homogeneous to heterogeneous, from isolated to linked. Exemplar outcomes of this shift are a wide range of graph data such as information, social, and knowledge graphs. The unique characteristics of graph data are challenging traditional search techniques like SQL and keyword search. Graph query is emerging as a promising complementary search form. In this paper, we study how to improve graph query by relevance feedback. Specifically, we focus on knowledge graph query, and formulate the graph relevance feedback (GRF) problem. We propose a general GRF framework that is able to (1) tune the original ranking function based on user feedback and (2) further enrich the query itself by mining new features from user feedback. As a consequence, a query-specific ranking function is generated, which is better aligned with the user search intent. Given a newly learned ranking function based on user feedback, we further investigate whether we shall re-rank the existing answers, or choose to search from scratch. We propose a strategy to train a binary classifier to predict which action will be more beneficial for a given query. The GRF framework is applied to searching DBpedia with graph queries derived from YAGO and Wikipedia. Experiment results show that GRF can improve the mean average precision by 80% to 100%.

show abstract

TaxoGen

Zhang

Tao

Chen

et al. 2018

View full text Add to dashboard Cite

Taxonomy construction is not only a fundamental task for semantic analysis of text corpora, but also an important step for applications such as information filtering, recommendation, and Web search. Existing pattern-based methods extract hypernym-hyponym term pairs and then organize these pairs into a taxonomy. However, by considering each term as an independent concept node, they overlook the topical proximity and the semantic correlations among terms. In this paper, we propose a method for constructing topic taxonomies, wherein every node represents a conceptual topic and is defined as a cluster of semantically coherent concept terms. Our method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion. To ensure the quality of the recursive process, it consists of: (1) an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones; (2) a local embedding module for learning term embeddings that maintain strong discriminative power at different levels of the taxonomy. Our experiments on two real datasets demonstrate the effectiveness of TaxoGen compared with baseline methods.

show abstract

Query Answering Efficiency in Expert Networks Under Decentralized Search

Srivatsa

Cansever³

et al. 2016

View full text Add to dashboard Cite

SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and Synonym Discovery

Shen¹,

Qiu²,

Shang³

et al. 2020

View full text Add to dashboard Cite

Entity set expansion and synonym discovery are two critical NLP tasks. Previous studies accomplish them separately, without exploring their interdependences. In this work, we hypothesize that these two tasks are tightly coupled because two synonymous entities tend to have similar likelihoods of belonging to various semantic classes. This motivates us to design SynSetExpan, a novel framework that enables two tasks to mutually enhance each other. SynSetExpan uses a synonym discovery model to include popular entities' infrequent synonyms into the set, which boosts the set expansion recall. Meanwhile, the set expansion model, being able to determine whether an entity belongs to a semantic class, can generate pseudo training data to fine-tune the synonym discovery model towards better accuracy.To facilitate the research on studying the interplays of these two tasks, we create the first large-scale Synonym-Enhanced Set Expansion (SE2) dataset via crowdsourcing. Extensive experiments on the SE2 dataset and previous benchmarks demonstrate the effectiveness of SynSetExpan for both entity set expansion and synonym discovery tasks. * Equal Contributions.

show abstract

Human-Assisted Machine Information Exploitation: a crowdsourced investigation of information-based problem solving

Kase

Vanni

Caylor

et al. 2017

View full text Add to dashboard Cite

Inter-Rater Agreement Measures and the Refinement of Metrics in the PLATO MT Evaluation Paradigm

Miller

Vanni²

2005

View full text Add to dashboard Cite

The PLATO machine translation (MT) evaluation (MTE) research program has as a goal the systematic development of a predictive relationship between discrete, welldefined MTE metrics and the specific information processing tasks that can be reliably performed with output. Traditional measures of quality, informed by the International Standards for Language Engineering (ISLE), namely, clarity, coherence, morphology, syntax, general and domain-specific lexical robustness, and named-entity translation, as well as a DARPAinspired measure of adequacy are its core. For robust validation, indispensable for refinement of tests and guidelines, we measure inter-rater reliability on the assessments. Here we report on our results, focusing on the PLATO Clarity and Coherence assessments, and we discuss our method for iteratively refining both the linguistic metrics and the guidelines for applying them within the PLATO evaluation paradigm. Finally, we discuss reasons why kappa might not be the best measure of interrater agreement for our purposes, and suggest directions for future investigation.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Michelle Vanni

Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases

Mining Entity Synonyms with Efficient Neural Set Generation

Exploiting Relevance Feedback in Knowledge Graph Search

TaxoGen

Query Answering Efficiency in Expert Networks Under Decentralized Search

SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and Synonym Discovery

Human-Assisted Machine Information Exploitation: a crowdsourced investigation of information-based problem solving

Inter-Rater Agreement Measures and the Refinement of Metrics in the PLATO MT Evaluation Paradigm

Contact Info

Product

Resources

About