In this paper we describe the different NLP techniques designed and used in collaboration between the CLLE-ERSS research laboratory and the CFH / Safety Data company to manage and analyse aviation incident reports. These reports are written every time anything abnormal occurs during a civil air flight. Although most of them relate routine problems, they are a valuable source of information about possible sources of greater danger. These texts are written in plain language, show a wide range of linguistic variation (telegraphic style overcrowded by acronyms or standard prose) and exist in different languages, even for a single company/country (although our main focus is on English and French). In addition to their variety, their sheer quantity (e.g. 600/month for a large airline company) clearly requires the use of advanced NLP and text mining techniques in order to extract useful information from them. Although this context and objectives seem to indicate that standard NLP techniques can be applied in a straightforward manner, innovative techniques are required to handle the specifics of aviation report text and the complex classification systems. We present several tools that aim at a better access to this data (classification and information retrieval), and help aviation safety experts in their analyses (data/text mining and interactive analysis).Some of these tools are currently in test or in use both at the national and international levels, by airline companies as well as by regulation authorities (DGAC 1 , EASA 2 , ICAO 3 ).
This paper addresses the task of categorizing companies within industry classification schemes. The dataset consists of encyclopedic articles about companies and their economic activities. The target classification schema is build by mapping linked open data in a semi-supervised manner. Target classes are built bottom-up from DBpedia. We apply several state of the art text classification techniques, based both on deep learning and classical vectorspace models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.