In 1972, Labov, Yaeger, and Steiner reported a series of "near-mergers" that have since proved to be difficult to assimilate to the standard conception of the phoneme and that challenged our current understanding of how language production is related to perception and learning (Labov, Yaeger, & Steiner, 1972). In these situations, speakers consistently reported that two classes of sounds were "the same," yet consistently differentiated them in production. Labov (1975a) suggested that this phenomenon was the explanation for two "falsely reported mergers" in the history of English, where word classes were said to have merged and afterward separated. It appears that sound change may bring two phonemes into such close approximation that semantic contrast between them is suspended for native speakers of the dialect, without necessarily leading to merger. This article reports on further observations of near-mergers, which confirm their implications for both synchronic and diachronic issues, and presents the results of experiments that show how phonemic contrast is suspended for an entire community.
Short-wave infrared (SWIR) imaging sensors are increasingly being used in surveillance and reconnaissance systems due to the reduced scatter in haze and the spectral response of materials over this wavelength range. Typically SWIR images have been provided either as full motion video from framing panchromatic systems or as spectral data cubes from line-scanning hyperspectral or multispectral systems. Here, we describe and characterize a system that bridges this divide, providing nine-band spectral images at 30 Hz. The system integrates a custom array of filters onto a commercial SWIR InGaAs array. We measure the filter placement and spectral response. We demonstrate a simple simulation technique to facilitate optimization of band selection for future sensors.
While neural networks have been employed to handle several different text-to-speech tasks, ours is the first system to use neural networks throughout, for both linguistic and acoustic processing. We divide the text-to-speech task into three subtasks, a linguistic module mapping from text to a linguistic representation, an acoustic module mapping from the linguistic representation to speech, and a video module mapping from the linguistic representation to animated images. The linguistic module employs a letter-to-sound neural network and a postlexical neural network. The acoustic module employs a duration neural network and a phonetic neural network. The visual neural network is employed in parallel to the acoustic module to drive a talking head. The use of neural networks that can be retrained on the characteristics of different voices and languages affords our system a degree of adaptability and naturalness heretofore unavailable.
We discuss the notion of language and dialect-specific search in the context of audio indexing. A system is described where users can find dialect or language-specific pronunciations of Afghan placenames in Dari and Pashto. We explore the efficacy of a phonetic speech recognition system employed in this task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.