Deep neural networks have advanced the state of the art in named entity recognition. However, under typical training procedures, advantages over classical methods emerge only with large datasets. As a result, deep learning is employed only when large public datasets or a large budget for manually labeling data is available. In this work, we show that by combining deep learning with active learning, we can outperform classical methods even with a significantly smaller amount of training data.
Categorical effects are found across speech sound categories, with the degree of these effects ranging from extremely strong categorical perception in consonants to nearly continuous perception in vowels. We show that both strong and weak categorical effects can be captured by a unified model. We treat speech perception as a statistical inference problem, assuming that listeners use their knowledge of categories as well as the acoustics of the signal to infer the intended productions of the speaker. Simulations show that the model provides close fits to empirical data, unifying past findings of categorical effects in consonants and vowels and capturing differences in the degree of categorical effects through a single parameter.
In this paper, we present MonoTrans2, a new user interface to support monolingual translation; that is, translation by people who speak only the source or target language, but not both. Compared to previous systems, MonoTrans2 supports multiple edits in parallel, and shorter tasks with less translation context. In an experiment translating children's books, we show that MonoTrans2 is able to substantially close the gap between machine translation and human bilingual translations. The percentage of sentences rated 5 out of 5 for fluency and adequacy by both bilingual evaluators in our study increased from 10% for Google Translate output to 68% for MonoTrans2.
Previous research in speech perception has shown that category information affects the discrimination of consonants to a greater extent than vowels. However, there has been little electrophysiological work on the perception of fricative sounds, which are informative for this contrast as they share properties with both consonants and vowels. In the current study we address the relative contribution of phonological and acoustic information to the perception of sibilant fricatives using event-related fields (ERFs) and dipole modeling with magnetoencephalography (MEG). We show that the field strength of neural responses peaking approximately 200 ms after sound onset co-varies with acoustic factors, while the cortical localization of earlier M100 responses suggests a stronger influence of phonological categories. We propose that neural equivalents of categorical perception for fricative sounds are best seen using localization measures, and that spectral cues are spatially coded in human cortex.
In 2007, we began an outreach program in Linguistics with psychology students in a local majority-minority high school. In the years since, the initial collaboration has grown to include other schools and nurtured a culture of community engagement in the language sciences at the University of Maryland. The program has led to a number of benefits for both the public school students and the University researchers involved. Over the years, our efforts have developed into a multi-faceted outreach program targeting primary and secondary school as well as the public more broadly. Through our outreach, we attempt to take a modest step toward increasing public awareness and appreciation of the importance of language science, toward the integration of research into the school curriculum, and giving potential first-generation college students a taste of what they are capable of. In this article, we describe in detail our motivations and goals, the details of the activities, and where we can go from here.
Targeted paraphrasing is a new approach to the problem of obtaining cost-effective, reasonable quality translation, which makes use of simple and inexpensive human computations by monolingual speakers in combination with machine translation. The key insight behind the process is that it is possible to spot likely translation errors with only monolingual knowledge of the target language, and it is possible to generate alternative ways to say the same thing (i.e., paraphrases) with only monolingual knowledge of the source language. Formal evaluation demonstrates that this approach can yield substantial improvements in translation quality, and the idea has been integrated into a broader framework for monolingual collaborative translation that produces fully accurate, fully fluent translations for a majority of sentences in a real-world translation task, with no involvement of human bilingual speakers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.