This paper provides a practical solution to the problem of generating (good) pseudowords, which are commonly used in vocabulary testing and experimental research in applied linguistics, and introduces an empirically founded solution to evaluating the suitability of pseudowords for different tasks. In the first part of the paper, we propose a novel way of generating pseudowords—a character-gram chaining algorithm. A major advantage of the algorithm is that it does not require any knowledge of the language, thereby facilitating the generation of pseudowords in any language. Secondly, there is currently a lack of formal criteria for evaluating pseudowords, both in terms of (i) their orthographic fit in the target language they are intended for and (ii) their suitability for use in various lexical processing and language teaching tasks. In the second part of the paper, we argue for the need to evaluate pseudowords, propose a set of linguistic criteria for evaluating the generated pseudowords, and provide a comparison with other current pseudoword lists using this criteria.
We propose a “smart” language learning system for students to acquire domain-specific vocabulary while taking an online course. F-Lingo, a browser plugin, works on top of the FutureLearn MOOC platform to provide learners with opportunities to study the words, phrases, and concepts that are important to the course topic. F-Lingo comprises three components. The Material Gathering component crawls the web pages of the MOOC course the student has chosen, collecting the entire textual content (with some exceptions). The Vocabulary Extraction component identifies domain-specific words, phrases, and concepts, and hyperlinks in the MOOC page to draw the student’s attention to them. Clicking a link displays a dialog window in which lexico-grammatical features, and definitions, of the extracted items can be studied, including illustrations in example sentences retrieved from external resources such as Wikipedia and FLAX. The Progress Tracking component records the clicks that students make on hyperlinks and the time spent in the dialog windows. This allows us to build the student’s vocabulary learning profile under the assumption that the more time the student pays attention to an item, the more worthy the item to be included in a follow-up language activity. These statistical data provide evidence and reasoning in our current and ongoing work on automatically generating personalized language activities and vocabulary tests at the end of the MOOC course. F-Lingo has been made available in three Data Mining courses on the FutureLearn MOOC platform and has been used by 109 learners. This research is ongoing. Future work focuses on automatically generating personalized vocabulary tests and activities based on the student’s click statistics.
F-Lingo is a chrome extension that works on top of the FutureLearn MOOC platform to support content-based language learning of domain-specific terminology for professional and academic purposes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.