A portable computational system called TADA was developed for the Task Dynamic model of speech motor control [Saltzman and Munhall, Ecol. Psychol. 1, 333–382 (1989)]. The model maps from a set of linguistic gestures, specified as activation functions with corresponding constriction goal parameters, to time functions for a set of model articulators. The original Task Dynamic code was ported to the (relatively) platform-independent MATLAB environment and includes a MATLAB version of the Haskins articulatory synthesizer, so that articulator motions computed by the Task Dynamic model can be used to generate sound. Gestural scores can now be edited graphically and the effects of gestural score changes on the models output evaluated. Other new features of the system include: (1) A graphical user interface that displays the input gestural scores, output time functions of constriction goal variables and articulators, and an animation of the resulting vocal-tract motion; (2) Integration of the Task Dynamic model with the prosodic clock-slowing, pi-gesture model of Byrd and Saltzman [J. Phonetics 31, 149–180 (2003)]. This now allows prosody-driven slowing to be applied to the full set of active gestures and its effects to be evaluated perceptually. [Work supported by NIH.]
Despite the lack of invariance problem (the many-to-many mapping between acoustics and percepts), human listeners experience phonetic constancy and typically perceive what a speaker intends. Most models of human speech recognition (HSR) have side-stepped this problem, working with abstract, idealized inputs and deferring the challenge of working with real speech. In contrast, carefully engineered deep learning networks allow robust, real-world automatic speech recognition (ASR). However, the complexities of deep learning architectures and training regimens make it difficult to use them to provide direct insights into mechanisms that may support HSR. In this brief article, we report preliminary results from a two-layer network that borrows one element from ASR, long short-term memory nodes, which provide dynamic memory for a range of temporal spans. This allows the model to learn to map real speech from multiple talkers to semantic targets with high accuracy, with human-like timecourse of lexical access and phonological competition. Internal representations emerge that resemble phonetically organized responses in human superior temporal gyrus, suggesting that the model develops a distributed phonological code despite no explicit training on phonetic or phonemic targets. The ability to work with real speech is a major advance for cognitive models of HSR.
This study investigated the cognitive effects of the flipped classroom approach in a content-based instructional context by comparing second language learners' discourse in flipped vs. traditional classrooms in terms of (1) participation rate, (2) content of comments, (3) reasoning skills, and (4) interactional patterns. Learners in two intact classes participated and were taught in either a flipped classroom (n ¼ 26) or a traditional classroom (n ¼ 25). In the flipped class, the learners listened to an online lecture before class and participated in a small-group discussion in class. In contrast, the learners in the traditional class listened to a teacher-led lecture in class and then immediately participated in a small-group discussion in class. The learners' discussions were audiorecorded. Quantitative and qualitative analyses indicated no difference in participation rates; however, the students in the flipped classroom produced more cognitive comments involving deeper information processing and higher-order reasoning skills and showed more cohesive interactional patterns than did the students in the traditional classroom. These results indicate that flipped classrooms can effectively promote higher-order thinking processes and in-depth, cohesive discussion in the contentbased second language.
Many different studies have claimed that articulatory information can be used to improve the performance of automatic speech recognition systems. Unfortunately, such articulatory information is not readily available in typical speaker-listener situations. Consequently, such information has to be estimated from the acoustic signal in a process which is usually termed “speech-inversion.” This study aims to propose and compare various machine learning strategies for speech inversion: Trajectory mixture density networks (TMDNs), feedforward artificial neural networks (FF-ANN), support vector regression (SVR), autoregressive artificial neural network (AR-ANN), and distal supervised learning (DSL). Further, using a database generated by the Haskins Laboratories speech production model, we test the claim that information regarding constrictions produced by the distinct organs of the vocal tract (vocal tract variables) is superior to flesh-point information (articulatory pellet trajectories) for the inversion process.
Timed grammaticality judgment tests (TGJT) and oral elicited imitation tests (OEIT) are considered reliable and valid measures of implicit linguistic knowledge, but studies consistently observe better performances on the TGJT than the OEIT due to the different types of processing they require: comprehension for the TGJT and production for the OEIT. This study examines whether degree of access to implicit knowledge is a function of processing type. Results from a series of factor analyses suggest that the OEIT requires greater access to implicit knowledge—implying that it measures stronger implicit knowledge—than the TGJT. Furthermore, the study examines effects on construct validity of time pressure in the OEIT (uncontrolled vs. controlled) and modality in the TGJT (written vs. aural). The results indicate that the tests reached higher construct validity, or measured stronger implicit knowledge, when the OEIT employed controlled time pressure and the TGJT used aural stimuli.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.