We describe the Second Multilingual Named Entity Challenge in Slavic languages. The task is recognizing mentions of named entities in Web documents, their normalization, and cross-lingual linking. The Challenge was organized as part of the 7th Balto-Slavic Natural Language Processing Workshop, co-located with the ACL-2019 conference. Eight teams participated in the competition, which covered four languages and five entity types. Performance for the named entity recognition task reached 90% F-measure, much higher than reported in the first edition of the Challenge. Seven teams covered all four languages, and five teams participated in the cross-lingual entity linking task. Detailed evaluation information is available on the shared task web page.
In the paper we present an adaptation of Liner2 framework to solve the BSNLP 2017 shared task on multilingual named entity recognition. The tool is tuned to recognize and lemmatize named entities for Polish.
A key challenge of the Information Extraction in Natural Language Processing is the ability to recognise and classify temporal expressions (timexes). It is a crucial source of information about when something happens, how often something occurs or how long something lasts. Timexes extracted automatically from text, play a major role in many Information Extraction systems, such as question answering or event recognition. We prepared a broad specification of Polish timexes – PLIMEX. It is based on the state-of-the-art annotation guidelines for English, mainly TIMEX2 and TIMEX3 (a part of TimeML – Markup Language for Temporal and Event Expressions). We have expanded our specification for a description of the local meaning of timexes, based on LTIMEX annotation guidelines for English. Temporal description supports further event identification and extends event description model, focussing on anchoring events in time, events ordering and reasoning about the persistence of events. We prepared the specification, which is designed to address these issues, and we annotated all documents in Polish Corpus of Wroclaw University of Technology (KPWr) using our annotation guidelines. We also adapted our Liner2 machine learning system to recognise Polish timexes and we propose two-phase method to select a subset of features for Conditional Random Fields sequence labelling method. This article presents the whole process of corpus annotation, evaluation of inter-annotator agreement, extending Liner2 system with new features and evaluation of the recognition models before and after feature selection with the analysis of statistical significance of differences. Liner2 with presented models is available as open source software under the GNU General Public License.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.