Dekang Lin scite author profile

Bootstrapping semantics from text is one of the greatest challenges in natural language learning. We first define a word similarity measure based on the distributional pattern of words. The similarity measure allows us to construct a thesaurus using a parsed corpus. We then present a new evaluation methodology for the automatically constructed thesaurus. The evaluation results show that the thesaurns is significantly closer to WordNet than Roget Thesaurus is.

show abstract

DIRT @SBT@discovery of inference rules from text

Lin

Pantel

2001

326

302

View full text Add to dashboard Cite

Discovery of inference rules for question-answering

Lin¹,

Pantel²

2001

Nat. Lang. Eng.

295

283

View full text Add to dashboard Cite

One of the main challenges in question-answering is the potential mismatch between the expressions in questions and the expressions in texts. While humans appear to use inference rules such as ‘X writes Y’ implies ‘X is the author of Y’ in answering questions, such rules are generally unavailable to question-answering systems due to the inherent difficulty in constructing them. In this paper, we present an unsupervised algorithm for discovering inference rules from text. Our algorithm is based on an extended version of Harris’ Distributional Hypothesis, which states that words that occurred in the same contexts tend to be similar. Instead of using this hypothesis on words, we apply it to paths in the dependency trees of a parsed corpus. Essentially, if two paths tend to link the same set of words, we hypothesize that their meanings are similar. We use examples to show that our system discovers many inference rules easily missed by humans.

show abstract

Automatic retrieval and clustering of similar words

Lin

1998

657

226

View full text Add to dashboard Cite

show abstract

Dependency-Based Evaluation of Minipar

Lin

2003

255

226

View full text Add to dashboard Cite

Discovering word senses from text

2002

View full text Add to dashboard Cite

Automatic identification of non-compositional phrases

Lin

1999

173

150

View full text Add to dashboard Cite

Non-compositional expressions present a special challenge to NLP applications. We present a method for automatic identification of non-compositional expressions using their statistical properties in a text corpus. Our method is based on the hypothesis that when a phrase is non-composition, its mutual information differs significantly from the mutual informations of phrases obtained by substituting one of the word in the phrase with a similar word.

show abstract

Discovering word senses from text

2002

View full text Add to dashboard Cite

Inventories of manually compiled dictionaries usually serve as a source for word senses. However, they often include many rare senses while missing corpus/domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers word senses from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning words to their most similar clusters. After assigning an element to a cluster, we remove their overlapping features from the element. This allows CBC to discover the less frequent senses of a word and to avoid discovering duplicate senses. Each cluster that a word belongs to represents one of its senses. We also present an evaluation methodology for automatically measuring the precision and recall of discovered senses.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dekang Lin

Automatic retrieval and clustering of similar words

DIRT @SBT@discovery of inference rules from text

Discovery of inference rules for question-answering

Automatic retrieval and clustering of similar words

Dependency-Based Evaluation of Minipar

Discovering word senses from text

Automatic identification of non-compositional phrases

Discovering word senses from text

Contact Info

Product

Resources

About