This article describes a neural network model of speech motor skill acquisition and speech production that explains a wide range of data on variability, motor equivalence, coarticulation, and rate effects. Model parameters are learned during a babbling phase. To explain how infants learn language-specific variability limits, speech sound targets take the form of convex regions, rather than points, in orosensory coordinates. Reducing target size for better accuracy during slower speech leads to differential effects for vowels and consonants, as seen in experiments previously used as evidence for separate control processes for the 2 sound types. Anticipatory coarticulation arises when targets are reduced in size on the basis of context; this generalizes the well-known look-ahead model of coarticulation. Computer simulations verify the model's properties.The primary goal of the modeling work described in this article is to provide a coherent theoretical framework that provides explanations for a wide range of data concerning the articulator movements used by humans to produce speech sounds. This is carried out by formulating a model that transforms strings of phonemes into continuous articulator movements for producing these phonemes. This study of speech production is largely motivated by the following question of speech acquisition: How does an infant acquire the motor skills needed to produce the speech sounds of his or her native language? Speech production involves complex interactions among several different reference frames. A phonetic frame describes the sounds a speaker wishes to produce, and the signals that convey these sound units to a listener exist within an acoustic frame. Tactile and proprioceptive signals form an orosensory frame (e.g., Perkell, 1980) that describes the shape of the vocal tract, and the muscles controlling the positions of individual articulators make up an articulatory frame. The parameters governing the interactions among these frames cannot be fixed at birth. One reason for this is the language specificity of these interactions. For example, English listeners distinguish between the sounds /r/ and /!/, but Japanese listeners do not. Corresponding differences are seen in the articulator movements of the two groups (Miyawaki et al., 1975). Thus, despite some obvious commonalities between the phonetics of different languages (e.g., widespread use of consonants like /d/, /n/, and /s/ across the world's languages), the precise nature of mappings between acoustic goals and articulator movements depends on the lanThis research was supported in part by Air Force Office of Scientific Research F49620-92-J-0499. I would like to thank Dan Bullock and Elliot Saltzman for their insightful suggestions on an earlier