Recent developments in hearing theory have resulted in the rather general acceptance of the idea that the perception of pitch of complex sounds is the result of the psychological pattern recognition process. The pitch is supposedly mediated by the fundamental of the harmonic spectrum which fits the spectrum of the complex sound optimally. The problem of finding the pitch is then equivalent to finding the best harmonic match. Goldstein [J. Acoust. Soc. Am. 54, 1496-1516 {1973)] has described an objective procedure for finding the best fit for stimuli containing relatively few spectral components. He uses a maximum likelihood criterion. Application of this procedure to various data on the pitch of complex sounds yielded good results. This motivated our efforts to apply the pattern recognition theory of pitch to the problem of measuring pitch in speech. Although we were able to follow the main line of Goldstein's procedure, some essential changes had to be made. The most important is that in our implementation not all spectral components of the complex sound have to be classified as belonging to the harmonic pattern. We introduced a harmonics sieve to determine whether components are rejected or accepted at a candidate pitch. A simple criterion, based on the components accepted and rejected, led to the decision on which candidate pitch was to be finally selected. The performance and reliability of this psychoacoustically based pitch meter were tested in a LPC-vocoder system. There is, however, an alternative approach to the problem, which, in our belief, can be highly successful. To begin with, pitch (e.g., of speech) is a subjective quantity. Therefore one might argue that the pitch meter which operates according to the principles of the human pitch extractor (the auditory system) will attain the optimum level of performance. This is un- , 1978). We propose that (1) this theory is also applicable to the (subjective) perception of pitch in speech and (2) that the theory can be put into the form of an (objective) algorithm which will produce pitch values that have a psychophysical validity as well as practical applicability. This validity stems from the fact that the data reduction in the algorithm proposed here is based on constraints known from hearing theory, which in turn relies on psychoacoustical and physiological data.In this paper we will not go into the details of the psychoacoustics of pitch. We restrict ourselves to a description of Goldstein's theory. We shall then discuss the additional steps that are involved in its application to speech material. Finally, the resulting algorithm is presented together with some data on its performance. The algorithm will briefly be compared with existing algorithms. As an example we present results of a direct comparison with the parallel processing pitch detector (PPROC) by Gold and Rabiner (1969). By considering the central processor as a system that has to match a set of frequencies to a harmonic pattern, the relation to pattern recognition is emphasized. The patte...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.