Classifier combination is an effective and broadly useful method of improving system performance. This article investigates in depth a large number of both well-established and novel classifier combination approaches for the word sense disambiguation task, studied over a diverse classifier pool which includes feature-enhanced Naïve Bayes, Cosine, Decision List, Transformation-based Learning and MMVC classifiers. Each classifier has access to the same rich feature space, comprised of distance weighted bag-of-lemmas, local ngram context and specific syntactic relations, such as Verb-Object and Noun-Modifier. This study examines several key issues in system combination for the word sense disambiguation task, ranging from algorithmic structure to parameter estimation. Experiments using the standard SENSEVAL2 lexical-sample data sets in four languages (English, Spanish, Swedish and Basque) demonstrate that the combination system obtains a significantly lower error rate when compared with other systems participating in the SENSEVAL2 exercise, yielding state-of-the-art performance on these data sets.
This poster investigates a novel query suggestion technique that selects query refinements through a combination of many users' post-query navigation patterns and the query logs of a large search engine. We compare this technique, which uses the queries that retrieve in the top-ranked search results places where searchers end up after post-query browsing (i.e., the landing pages), with an approach based on query refinements from user search sessions extracted from query logs. Our findings demonstrate the effectiveness of using landing pages for the direct generation of query suggestions, as well as the complementary nature of the suggestions it generates with regard to traditional query log based refinement methodologies.
This paper presents a method for bootstrapping a fine-grained, broad-coverage part-of-speech (POS) tagger in a new language using only one personday of data acquisition effort. It requires only three resources, which are currently readily available in 60-100 world languages: (1) an online or hardcopy pocket-sized bilingual dictionary, (2) a basic library reference grammar, and (3) access to an existing monolingual text corpus in the language. The algorithm begins by inducing initial lexical POS distributions from English translations in a bilingual dictionary without POS tags. It handles irregular, regular and semi-regular morphology through a robust generative model using weighted Levenshtein alignments. Unsupervised induction of grammatical gender is performed via global modeling of contextwindow feature agreement. Using a combination of these and other evidence sources, interactive training of context and lexical prior models are accomplished for fine-grained POS tag spaces. Experiments show high accuracy, fine-grained tag resolution with minimal new human effort.
Exploiting lexical and semantic relationships in large unstructured text collections can significantly enhance managing, integrating, and querying information locked in unstructured text. Most notably, named entities and relations between entities are crucial for effective question answering and other information retrieval and knowledge management tasks. Unfortunately, the success in extracting these relationships can vary for different domains, languages, and document collections. Predicting extraction performance is an important step towards scalable and intelligent knowledge management, information retrieval and information integration. We present a general language modeling method for quantifying the difficulty of information extraction tasks. We demonstrate the viability of our approach by predicting performance of real world information extraction tasks, Named Entity recognition and Relation Extraction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.