In this paper we design and implement a benchmarking framework for fair and exhaustive comparison of entity-annotation systems. The framework is based upon the definition of a set of problems related to the entity-annotation task, a set of measures to evaluate systems performance, and a systematic comparative evaluation involving all publicly available datasets, containing texts of various types such as news, tweets and Web pages. Our framework is easily-extensible with novel entity annotators, datasets and evaluation measures for comparing systems, and it has been released to the public as open source 1 . We use this framework to perform the first extensive comparison among all available entity annotators over all available datasets, and draw many interesting conclusions upon their efficiency and effectiveness. We also draw conclusions between academic versus commercial annotators.
WWW 2016 General Chairs' Welcome We welcome you to this OUVERT (open in English) WWW2016 conference, the 25th of the series, being held at the Palais des congress in Montreal. OUVERT is our motto, to show our support and encouragement of the Web's ethos of open data, government, health, education and more. The annual World Wide Web Conference is the premier international forum to present and discuss progress in research, development, standards, and applications related to the Web and to Web science. WWW is organized under the aegis of the International World Wide Web Conference Committee (IW3C2) in collaboration with local conference organizers of the host country, in this case the Université du Québec à Montréal (UQAM). WWW 2016 offers a unique opportunity for sharing the latest insights of academic and industrial research, as well as to experience Montreal, a vibrant city sharing features form both Europe and North America. WWW 2016 offers you an opportunity to participate in high quality technical activities, including research sessions, poster sessions, workshops, tutorials, demonstrations, an industry track, a W3C track, panels, and a Ph.D. symposium. Co-located events include the 3nd edition of the Big Data Innovators Gathering (BIG 2016), a 2nd edition of the Entrepreneurs Track (ET), the Digital Health Conference (DH), the Web for All conference (W4A), and a meeting and exhibition by l'Académie québécoise de 'Pataphysique (AQ'P). A special event on Wednesday night includes a talk, open to the public, by the Baroness Martha Lane Fox entitled "Dot everyone-Power, the Internet and You." We will also have three other keynote speeches by world-class experts: Tim Berners-Lee, Mary-Ellen Zurko, and Peter Norvig. We will also feature a plenary panel on the Web and Creativity chaired by digital musician Andrew Hugill and a Friday panel on "The Web and social action" featuring speakers talking about how the Web can be used to change the world for the better. The Research track presents 118 high quality papers, 72 posters and 30 demos. The Ph.D. Symposium track has 7 presentations by doctoral students, the Industry track consists of 8 speeches from prominent industrial researchers, and the W3C track is composed of sessions on the latest Web standards and emerging technologies. In addition to the tracks and special programs, workshops and tutorials have been organized to report ongoing work and to provide in-depth knowledge on important subjects; this includes 21 workshops and 7 tutorials on a wide range of cutting-edge topics. Many individuals and institutions contributed by their hard work to the success of this conference. We would especially like to thank the PC chairs, Ian Horrocks and Ben Zhao who put a huge amount of time into making sure the technical tracks were at the high academic level expected of this leading Web Conference. We also thank the track, demo, workshop and tutorial chairs, and the many workshop organisers for selecting the best possible technical content for the conference. We also thank the ...
Abstractive text summarization of news requires a way of representing events, such as a collection of pattern clusters in which every cluster represents an event (e.g., marriage) and every pattern in the cluster is a way of expressing the event (e.g., X married Y, X and Y tied the knot). We compare three ways of extracting event patterns: heuristics-based, compressionbased and memory-based. While the former has been used previously in multidocument abstraction, the latter two have never been used for this task. Compared with the first two techniques, the memorybased method allows for generating significantly more grammatical and informative sentences, at the cost of searching a vast space of hundreds of millions of parse trees of known grammatical utterances. To this end, we introduce a data structure and a search method that make it possible to efficiently extrapolate from every sentence the parse sub-trees that match against any of the stored utterances.
The need to bridge between the unstructured data on the Document Web and the structured data on the Web of Data has led to the development of a considerable number of annotation tools. However, these tools are currently still hard to compare since the published evaluation results are calculated on diverse datasets and evaluated based on different measures. We present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools on multiple datasets. By these means, we aim to ensure that both tool developers and end users can derive meaningful insights pertaining to the extension, integration and use of annotation applications. In particular, GERBIL provides comparable results to tool developers so as to allow them to easily discover the strengths and weaknesses of their implementations with respect to the state of the art. With the permanent experiment URIs provided by our framework, we ensure the reproducibility and archiving of evaluation results. Moreover, the framework generates data in machineprocessable format, allowing for the efficient querying and post-processing of evaluation results. Finally, the tool diagnostics provided by GERBIL allows deriving insights pertaining to the areas in which tools should be further refined, thus allowing developers to create an informed agenda for extensions and end users to detect the right tools for their purposes. GERBIL aims to become a focal point for the state of the art, driving the research agenda of the community by presenting comparable objective evaluation results.
The SMAPH system implements a pipeline of four main steps: (1) Fetching -it fetches the search results returned by a search engine given the query to be annotated; (2) Spotting -search result snippets are parsed to identify candidate mentions for the entities to be annotated. This is done in a novel way by detecting the keywords-in-context by looking at the bold parts of the search snippets; (3) Candidate generation -candidate entities are generated in two ways: from the Wikipedia pages occurring in the search results, and from an existing annotator, using the mentions identified in the spotting step as input; (4) Pruning -a binary SVM classifier is used to decide which entities to keep/discard in order to generate the final annotation set for the query. The SMAPH system ranked third on the development set and first on the final blind test of the 2014 ERD Challenge short text track.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.