Learner corpora, electronic collections of spoken or written data from foreign language learners, offer unparalleled access to many hitherto uncovered aspects of learner language, particularly in their error-tagged format. This article aims to demonstrate the role that the learner corpus can play in CALL, particularly when used in conjunction with web-based interfaces which provide flexible access to error-tagged corpora that have been enhanced with simple NLP techniques such as POStagging or lemmatization and linked to a wide range of learner and task variables such as mother tongue background or activity type. This new resource is of interest to three main types of users: teachers wishing to prepare pedagogical materials that target learners' attested difficulties; learners themselves for editing or language awareness purposes and NLP researchers, for whom it serves as a benchmark for testing automatic error detection systems.
Le présent article se focalise sur le développement d'outils de traitement automatique des langues (TAL) pour l'apprentissage des langues assisté par ordinateur (ALAO). Après avoir identifié les limitations inhérentes aux outils d'ALAO dépourvus de composantes TAL, nous décrivons le cadre général du projet MIRTO, une plateforme de création d'activités pédagogiques fondé sur des outils TAL en développement au sein de notre laboratoire. Cette plateforme est organisée en quatre couches distinctes et successives : fonctions, scripts, activités et scénarios. À travers plusieurs exemples, nous expliquons en quoi l'architecture de MIRTO permet l'implantation de fonctions TAL classiques au sein de scripts, lesquels facilitent la conception, sans compétence informatique préalable, d'activités didactiques, elles-mêmes éventuellement intégrées au sein de séquences plus complexes, ou scénarios
This article focuses on the development of Natural Language Processing (NLP) tools for Computer Assisted Language Learning (CALL). After identifying the inherent limitations of NLP-free tools, we describe the general framework of Mirto, an NLP-based authoring platform under construction in our laboratory, and organized into four distinct layers: functions, scripts, activities and scenarios. Through several examples, we explain how Mirto's architecture allows to implement state-of-the-art NLP functions, integrate them into easily handled scripts in order to create, without computing skills, didactic activities that could be recorded in more complex sequences or scenarios.
In this paper, we study how single-word term extraction and bilingual lexical alignment can be used and combined to assist terminologists when they compile bilingual specialized dictionaries. Two specific tools -namely a term extractor called TermoStat and a sentence and lexical aligner called Alinea -are tested in a specific project the aim of which is the development of an English-French dictionary on climate change. We analyze the results of lexical alignment based on a typology of terminology equivalents. We first extracted French candidate terms that were then submitted to the lexical aligner. The results show that the use of these tools proves to be a valuable asset for compiling bilingual dictionaries. Most equivalents provided by the aligner were valid and the tool was able to locate several valid English equivalents (some of which were structurally different) for candidate terms.
This paper presents a "didactic triangulation" strategy to cope with the problem of reliability of NLP applications for Computer Assisted Language Learning (CALL) systems. It is based on the implementation of basic but well mastered NLP techniques, and put the emphasis on an adapted gearing between computable linguistic clues and didactic features of the evaluated activities. We claim that a correct balance between false positives (i.e. false error detection)and false negatives (i.e. undetected errors) is not only an outcome of NLP techniques, but of an appropriate didactic integration of what NLP can do well-and what it cannot do. Based on this approach, ExoGen is a prototype for generating activities such as gapfill exercises. It integrates a module for error detection and description, which checks learners' answers against expected ones. Through the analysis of graphic, orthographic and morphosyntactic differences, it is able to diagnose problems like spelling errors, lexical mix-ups, errors prone agreement, conjugation errors, etc. The first evaluation of ExoGen outputs, based on the FRIDA learner corpus, has yielded very promising results, paving the way for the development of an efficient and general model adapted to a wide variety of activities.
Nous montrons dans cet article comment exploiter un corpus annoté en dépendances syntaxiques : nous chercherons à extraire des cooccurrents synthétisant la combinatoire lexico-syntaxiques des mots et aussi à travailler à un niveau plus général sur des expressions, voire des constructions, plus complexes et plus abstraites que les simples pivots lexicaux. Pour mettre en œuvre les requêtes sous-jacentes, et permettre à des utilisateurs non experts de les manipuler, nous proposons de guider l’exploration par l’analogie et de construire des requêtes sur la base d’exemples avec l’outil Lexicoscope .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.