Bilyana Taneva scite author profile

While images of famous people and places are abundant on the Internet, they are much harder to retrieve for less popular entities such as notable computer scientists or regionally interesting churches. Querying the entity names in image search engines yields large candidate lists, but they often have low precision and unsatisfactory recall. In this paper, we propose a principled model for finding images of rare or ambiguous named entities. We propose a set of efficient, light-weight algorithms for identifying entity-specific keyphrases from a given textual description of the entity, which we then use to score candidate images based on the matches of keyphrases in the underlying Web pages. Our experiments show the high precision-recall quality of our approach.

show abstract

Efficient Set Intersection Counting Algorithm for Text Similarity Measures

Lahoti¹,

Nicholson

Taneva

2017

View full text Add to dashboard Cite

Set intersection counting appears as a subroutine in many techniques used in natural language processing, in which similarity is often measured as a function of document cooccurence counts between pairs of noun phrases or entities. Such techniques include clustering of text phrases and named entities, topic labeling, entity disambiguation, sentiment analysis, and search for synonyms.These techniques can have real-time constraints that require very fast computation of thousands of set intersection counting queries with little space overhead and minimal error. On one hand, while sketching techniques for approximate intersection counting exist and have very fast query time, many have issues with accuracy, especially for pairs of lists that have low Jaccard similarity. On the other hand, space-efficient computation of exact intersection sizes is particularly challenging in real-time.In this paper, we show how an efficient spacetime trade-off can be achieved for exact set intersection counting, by combining state-of-the-art algorithms with precomputation and judicious use of compression. In addition, we show that the performance can be further improved by combining the best aspects of these algorithms. We present experimental evidence that realtime computation of exact intersection sizes is feasible with low memory overhead: we improve the mean query time of baseline approaches by over a factor of 100 using a data structure that takes merely twice the size of an inverted index. Overall, in our experiments, we achieve running times within the same order of magnitude as well-known approximation techniques.

show abstract

Choice-Based Conjoint Analysis: Classification vs. Discrete Choice Models

Giesen

Mueller

Taneva

et al. 2010

View full text Add to dashboard Cite

ANNOTATE: orgANizing uNstructured cOntenTs viA Topic labEls

Ajwani

Taneva

Dutta

et al. 2018

View full text Add to dashboard Cite

show abstract

Automatic population of knowledge bases with multimodal data about named entities

Taneva¹

2013

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Bilyana Taneva

Gathering and ranking photos of named entities with high precision, high recall, and diversity

Gem-based entity-knowledge maintenance

Mining acronym expansions and their meanings using query click log

Finding images of difficult entities in the long tail

Efficient Set Intersection Counting Algorithm for Text Similarity Measures

Choice-Based Conjoint Analysis: Classification vs. Discrete Choice Models

ANNOTATE: orgANizing uNstructured cOntenTs viA Topic labEls

Automatic population of knowledge bases with multimodal data about named entities

Contact Info

Product

Resources

About