This work investigates the use of linguistically motivated features to improve search, in particular for ranking answers to non-factoid questions. We show that it is possible to exploit existing large collections of question–answer pairs (from online social Question Answering sites) to extract such features and train ranking models which combine them effectively. We investigate a wide range of feature types, some exploiting natural language processing such as coarse word sense disambiguation, named-entity identification, syntactic parsing, and semantic role labeling. Our experiments demonstrate that linguistic features, in combination, yield considerable improvements in accuracy. Depending on the system settings we measure relative improvements of 14% to 21% in Mean Reciprocal Rank and Precision@1, providing one of the most compelling evidence to date that complex linguistic features such as word senses and semantic roles can have a significant impact on large-scale information retrieval tasks.
In this paper we design and implement a benchmarking framework for fair and exhaustive comparison of entity-annotation systems. The framework is based upon the definition of a set of problems related to the entity-annotation task, a set of measures to evaluate systems performance, and a systematic comparative evaluation involving all publicly available datasets, containing texts of various types such as news, tweets and Web pages. Our framework is easily-extensible with novel entity annotators, datasets and evaluation measures for comparing systems, and it has been released to the public as open source 1 . We use this framework to perform the first extensive comparison among all available entity annotators over all available datasets, and draw many interesting conclusions upon their efficiency and effectiveness. We also draw conclusions between academic versus commercial annotators.
Abstract. We present a comprehensive approach to ontology evaluation and validation, which have become a crucial problem for the development of semantic technologies. Existing evaluation methods are integrated into one sigle framework by means of a formal model. This model consists, firstly, of a metaontology called O 2 , that characterises ontologies as semiotic objects. Based on O 2 and an analysis of existing methodologies, we identify three main types of measures for evaluation: structural measures, that are typical of ontologies represented as graphs; functional measures, that are related to the intended use of an ontology and of its components; and usability-profiling measures, that depend on the level of annotation of the considered ontology. The metaontology is then complemented with an ontology of ontology validation called oQual, which provides the means to devise the best set of criteria for choosing an ontology over others in the context of a given project. Finally, we provide a small example of how to apply oQual-derived criteria to a validation case.
In this paper we approach word sense disambiguation and information extraction as a unified tagging problem. The task consists of annotating text with the tagset defined by the 41 Wordnet supersense classes for nouns and verbs. Since the tagset is directly related to Wordnet synsets, the tagger returns partial word sense disambiguation. Furthermore, since the noun tags include the standard named entity detection classes-person, location, organization, time, etc.-the tagger, as a by-product, returns extended named entity information. We cast the problem of supersense tagging as a sequential labeling task and investigate it empirically with a discriminatively-trained Hidden Markov Model. Experimental evaluation on the main sense-annotated datasets available, i.e., Semcor and Senseval, shows considerable improvements over the best known "first-sense" baseline.
Search results clustering (SRC) is a challenging algorithmic problem that requires grouping together the results returned by one or more search engines in topically coherent clusters, and labeling the clusters with meaningful phrases describing the topics of the results included in them.In this paper we propose to solve SRC via an innovative approach that consists of modeling the problem as the labeled clustering of the nodes of a newly introduced graph of topics. The topics are Wikipedia-pages identified by means of recently proposed topic annotators [9,11,16,20] applied to the search results, and the edges denote the relatedness among these topics computed by taking into account the linkage of the Wikipedia-graph.We tackle this problem by designing a novel algorithm that exploits the spectral properties and the labels of that graph of topics. We show the superiority of our approach with respect to academic state-of-the-art work [6] and wellknown commercial systems (Clusty and Lingo3G) by performing an extensive set of experiments on standard datasets and user studies via Amazon Mechanical Turk. We test several standard measures for evaluating the performance of all systems and show a relative improvement of up to 20%.
We present a new framework for classifying common nouns that extends namedentity classification. We used a fixed set of 26 semantic labels, which we called supersenses. These are the labels used by lexicographers developing WordNet. This framework has a number of practical advantages. We show how information contained in the dictionary can be used as additional training data that improves accuracy in learning new nouns. We also define a more realistic evaluation procedure than cross-validation.
We discuss the problem of ranking very many entities of different types. In particular we deal with a heterogeneous set of types, some being very generic and some very specific. We discuss two approaches for this problem: i) exploiting the entity containment graph and ii) using a Web search engine to compute entity relevance. We evaluate these approaches on the real task of ranking Wikipedia entities typed with a state-of-the-art named-entity tagger. Results show that both approaches can greatly increase the performance of methods based only on passage retrieval.
WWW 2016 General Chairs' Welcome We welcome you to this OUVERT (open in English) WWW2016 conference, the 25th of the series, being held at the Palais des congress in Montreal. OUVERT is our motto, to show our support and encouragement of the Web's ethos of open data, government, health, education and more. The annual World Wide Web Conference is the premier international forum to present and discuss progress in research, development, standards, and applications related to the Web and to Web science. WWW is organized under the aegis of the International World Wide Web Conference Committee (IW3C2) in collaboration with local conference organizers of the host country, in this case the Université du Québec à Montréal (UQAM). WWW 2016 offers a unique opportunity for sharing the latest insights of academic and industrial research, as well as to experience Montreal, a vibrant city sharing features form both Europe and North America. WWW 2016 offers you an opportunity to participate in high quality technical activities, including research sessions, poster sessions, workshops, tutorials, demonstrations, an industry track, a W3C track, panels, and a Ph.D. symposium. Co-located events include the 3nd edition of the Big Data Innovators Gathering (BIG 2016), a 2nd edition of the Entrepreneurs Track (ET), the Digital Health Conference (DH), the Web for All conference (W4A), and a meeting and exhibition by l'Académie québécoise de 'Pataphysique (AQ'P). A special event on Wednesday night includes a talk, open to the public, by the Baroness Martha Lane Fox entitled "Dot everyone-Power, the Internet and You." We will also have three other keynote speeches by world-class experts: Tim Berners-Lee, Mary-Ellen Zurko, and Peter Norvig. We will also feature a plenary panel on the Web and Creativity chaired by digital musician Andrew Hugill and a Friday panel on "The Web and social action" featuring speakers talking about how the Web can be used to change the world for the better. The Research track presents 118 high quality papers, 72 posters and 30 demos. The Ph.D. Symposium track has 7 presentations by doctoral students, the Industry track consists of 8 speeches from prominent industrial researchers, and the W3C track is composed of sessions on the latest Web standards and emerging technologies. In addition to the tracks and special programs, workshops and tutorials have been organized to report ongoing work and to provide in-depth knowledge on important subjects; this includes 21 workshops and 7 tutorials on a wide range of cutting-edge topics. Many individuals and institutions contributed by their hard work to the success of this conference. We would especially like to thank the PC chairs, Ian Horrocks and Ben Zhao who put a huge amount of time into making sure the technical tracks were at the high academic level expected of this leading Web Conference. We also thank the track, demo, workshop and tutorial chairs, and the many workshop organisers for selecting the best possible technical content for the conference. We also thank the ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.