A rationale is advanced for digitally coding speech signals in terms of sub‐bands of the total spectrum. The approach provides a means for controlling and reducing quantizing noise in the coding. Each sub‐band is quantized with an accuracy (bit allocation) based upon perceptual criteria. As a result, the quality of the coded signal is improved over that obtained from a single full‐band coding of the total spectrum. In one implementation, the individual sub‐bands are low‐pass translated before coding. In another, “integer‐band” sampling is employed to alias the signal in an advantageous way before coding. Other possibilities extend to complex demodulation of the sub‐bands, and to representing the sub‐band signals in terms of envelopes and phase‐derivatives. In all techniques, adaptive quantization is used for the coding, and a parsimonious allocation of bits is made across the bands. Computer simulations are made to demonstrate the signal qualities obtained for codings at 16 and 9.6 kb/s.
A rationale is advanced for digitally coding speech signals in terms of sub-bands of the total spectrum. The approach provides a means for controlling and reducing quantizing noise in the coding. Each sub-band is quantized with an accuracy (bit allocation) based upon perceptual criteria. As a result, the quality of the coded signal is improved over that obtained from a single full-band coding of the total spectrum. In one implementation, the individual sub-bands are low-pass translated before coding. In another, "integerband" sampling is employed to alias the signal in an advantageous way before coding. Other possibilities extend to complex demodulation of the sub-bands, and to representing the subband signals in terms of envelopes and phasederivatives. In all techniques, adaptive quantization is used for the coding, and a parsimonious allocation of bits is made across the bands. Computer simulations are made to demonstrate the signal qualities obtained for codings at 16 and 9.6 Kbits/sec. Division of Speech Spectrum into Sub-BandsFor digital transmission a signal must be sampled and quantized. Quantization is a nonlinear operation and produces distortion products which are typically broad in spectrum. Because of the characteristics of the speech spectrum, quantizing distortion is not equally detectable at all frequencies. Coding the signal in narrower sub-bands offers one possibility for controlling the distribution of quantizing noise across the signal spectrum, and hence for realizing an improvement in signal quality.
Speech transmission by switched digital packets offers several op portunities for increasing the utilization of transmission capacity. We comment here upon a combination of variable-quality coding and time-interval modification that can efficiently load a transmission facility and accommodate fluctuating demands on it.Consider, typically, that a conventional voice switch detects speech energy bursts and demarks each as a packet. A time stamp is given to each packet, and the interburst silences are discarded. Each packet is digitally encoded with a quality that reflects service demands being made on the transmission facility at the moment. Coding bit rate and timestamp are written in the header data for each packet, along with neces sary supervisory information, such as destination and source addresses. Successive packets are assembled in a transmit buffer and are trans mitted when capacity is available. Figure 1 illustrates the process.At the receiver, arriving packets are accepted into a receive buffer. The receiver decodes each packet (in accordance with the header bit rate), reassembles the packets in temporal order (according to the time-stamp), and reinserts the silent intervals, not necessarily exactly as in the original, but with a variation that is perceptually acceptable.Relevant design questions include: (i) how much saving in transmis sion capacity can be achieved by discarding the silent intervals, (ii) what range of signal quality is acceptable in digitally coding the packets, (Hi) what latitude is perceptually acceptable in reconstructing the speech silent intervals, (tu) what total round-trip delay time is allowable in a packet system, (ő) what transmit and receive buffer sizes are required, and (ui) what packet sizes are attractive for transmission economy. 1569
Speech transmission by switched digital packets provides opportunities for increased utilization of transmission capacity. We describe a combination of processing features designed to maximally load a transmission facility and accommodate fluctuating demands on it. Our system detects individual speech energy bursts and, in effect, discards the silent intervals between bursts. It digitally encodes each burst with a quality (bit rate) that reflects serve demands being made on the transmission facility at the moment. In a preliminary study of these features, we find that elimination of within-sentence silent intervals saves about 20% of the total sentence time. We also find that digital coding (by ADPCM) in the range 40–20 Kbits/sec provides useful quality variation. We suggest that these flexibilities and modest buffer storage can aid utilization of transmission capacity.
General rules for production of English phonemes may be studied very effectively by statistical methods. The rules have numerous exceptions, however, involving special treatment of sometimes infrequently occurring conditions. The problem is to identify these exceptions, and then to find enough examples to characterize them. Speech synthesis is especially suitable for this kind of exploratory study. When synthesized utterances are substantially correct, exceptions stand out clearly, as mispronounced words. In our synthesis, a program for regular timing and allophone selection is followed by a set of rewriting rules to handle exceptions. In using the system, when utterances are found for which the rules are inadequate, the program allows quick examination of similar phrases, and interactive editing at the segment level to evaluate potential rule changes. New rules are easily added, or existing ones further qualified. Using this method, we have developed quantitative rules for nasal assimilation, flaps, changes within cluster, syllabic consonants, context-dependent vowel glides, and other adjustments to prevent devoicing and slurring of weak syllables. The rules are nonrecursive, LR1 at the segment level, and programmable in about 80 FORTRAN logical IFs. The oral paper will summarize these rules. Part of the presentation will be in synthetic speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.