This paper presents the system submitted by University of Wolverhampton for SemEval-2014 task 1. We proposed a machine learning approach which is based on features extracted using Typed Dependencies, Paraphrasing, Machine Translation evaluation metrics, Quality Estimation metrics and Corpus Pattern Analysis. Our system performed satisfactorily and obtained 0.711 Pearson correlation for the semantic relatedness task and 78.52% accuracy for the textual entailment task.
This paper describes the first SemEval task to explore the use of Natural Language Processing systems for building dictionary entries, in the framework of Corpus Pattern Analysis. CPA is a corpus-driven technique which provides tools and resources to identify and represent unambiguously the main semantic patterns in which words are used. Task 15 draws on the Pattern Dictionary of English Verbs (www.pdev.org.uk), for the targeted lexical entries, and on the British National Corpus for the input text.Dictionary entry building is split into three subtasks which all start from the same concordance sample: 1) CPA parsing, where arguments and their syntactic and semantic categories have to be identified, 2) CPA clustering, in which sentences with similar patterns have to be clustered and 3) CPA automatic lexicography where the structure of patterns have to be constructed automatically. Subtask 1 attracted 3 teams, though none could beat the baseline (rule-based system). Subtask 2 attracted 2 teams, one of which beat the baseline (majority-class classifier). Subtask 3 did not attract any participant.The task has produced a major semantic multidataset resource which includes data for 121 verbs and about 17,000 annotated sentences, and which is freely accessible.
Actual research on child-machine interaction indicate that children are specific with respect to various acoustic, linguistic [7], psychological, cultural and social factors. We wish to address the linguistic factor, focusing on the semantic knowledge which needs to be mastered by a computer system designed to interact with children. Our work is intentionally usage-based and application-driven.
Purpose This paper presents a novel neural-based approach, applicable to any searchable PDF document that first detects the titles and then hierarchically orders them using a sequence labelling approach to generate automatically the Table of Contents (TOC). A TOC signals the main divisions and subdivisions of a document to assist with navigation and information localisation. Methods Unlike previous methods, we do not assume the presence of parsable TOC pages in the document but infer the TOC from a data-driven analysis of sections titles, their order and their depth. Results We offer an exhaustive analysis of the proposed model and evaluate it on French and English using documents from the financial domain, which we release to increase community's interest. We compare this model to state-of-the-art approaches and show its superiority in multiple experiments. Conclusions The approach described in this paper can easily be adapted to other domains and documents and its application to the analysis of financial prospectuses will be strengthened by the release of datasets. The TOC generation algorithms used in this paper obtain state-of-the-art results and provide strong baselines for future work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.