Since the wave form of speech is apparently continuous and the phonemic entity is discrete, it is reasonable to expect that during the developmental stages of speech compression systems the process will gradually evolve from continuous to mixed discrete-continuous and finally to a completely discrete type. Two plans for such systems are discussed. Plan I is an attempt to use two discrete features of speech together with continuous extractions of the formants, moments, and pitch of speech. Plan II is an attempt to identify phoneme-like elements through a scheme of successive selection.
A system for the analysis and synthesis speech is described. In the analyzer, speech sounds are first classified into nonturbulent and turbulent groups. The first three formants of the former group and the first three moments of the latter group constitute six significant parameters of speech spectra. By measuring the zero-crossing densities and/or envelopes of automatically selected frequency bands, these parameters or their equivalents are extracted. To test the feasibility of using these parameters in a speech compression system, a synthesis procedure is carried out. The synthesized speech and its spectrograms are demonstrated. [The research in this paper has been made possible through support and sponsorship extended by the Electronics Research Directorate of the Air Force Cambridge Research Center, under Contract No. AF 19(604)-1039, Item I.]
The points of zero crossing and the points of zero slope of the oscillograms of speech sounds are considered to contain the essential information for intelligibility. The intervals between zero crossings θ0, and the intervals between zero slopes θm, are plotted as points in rectangular coordinates. The ordinate of the dot is a function of θ(θ0 or θm), and the abscissa is a function of the time of occurrence t of the particular θ. The choice of these functions depends upon the types of portrayal needed for a specific analysis of speech sounds. The resulting intervalgram gives a halftone picture (consisting of dots) of speech sounds. The patterns may be proportioned to show either a detailed or general representation of the variation of the interval distribution. One type of pattern portrayed at the speech rate on a cathode-ray oscilloscope with a screen of long persistence has been found quite similar in certain respects to the patterns obtainable by the sound spectrograph as described in the book Visible Speech by Potter, Kopp, and Green. The equipment involved in obtaining the intervalgram, however, is much simpler.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.