Diphone synthesis using an overlap-add technique for speech waveforms concatenation

Charpentier, F.; Stella, Maja

doi:10.1109/icassp.1986.1168657

Cited by 107 publications

(55 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This was done to avoid ceiling or floor effects. The materials were recorded and digitized at 16 kHz (16 bits quantization) and compressed using the PSOLA algorithm (Charpentier & Stella, 1986). The program operates by first labeling the signal at each consecutive pitch period.…”

Section: Resultsmentioning

confidence: 99%

Perceptual adjustment to time-compressed speech: A cross-linguistic study

et al. 1998

View full text Add to dashboard Cite

Previous research has shown that, when hearers listen to artificially speeded speech, their performance improves over the course of 10-15 sentences, as if their perceptual system was "adapting" to these fast rates of speech. In this paper, we further investigate the mechanisms that are responsible for such effects. In Experiment 1, we report that, for bilingual speakers of Catalan and Spanish, exposure to compressed sentences in either language improves performance on sentences in the other language. Experiment 2 reports that Catalan/Spanish transfer of performance occurs even in monolingual speakers of Spanish who do not understand Catalan. In Experiment 3, we study another pair of languagesnamely, English and French-and report no transfer of adaptation between these two languages for English-French bilinguals. Experiment 4, with monolingual English speakers, assesses transfer of adaptation from French, Dutch, and English toward English. Here we find that there is no adaptation from French and intermediate adaptation from Dutch. We discuss the locus of the adaptation to compressed speech and relate our findings to other cross-linguistic studies in speech perception.

show abstract

Section: Resultsmentioning

confidence: 99%

Perceptual adjustment to time-compressed speech: A cross-linguistic study

et al. 1998

View full text Add to dashboard Cite

show abstract

“…Table I shows the number of syllables, duration values, and articulation rates for the 10 sentences used in each adaptation phase. These sentences were digitized at 16 kHz and compressed by using the PSOLA algorithm (Charpen- Stella, 1986). This algorithm is fully automatic and general and can be applied to any sounds, speech or nonspeech.…”

mentioning

confidence: 99%

Adaptation to time-compressed speech: Phonological determinants

Sebastián‐Gallés

Dupoux

Costa

et al. 2000

Perception & Psychophysics

View full text Add to dashboard Cite

Perceptual adaptation to time-compressed speech was analyzed in two experiments. Previous research has suggested that this adaptation phenomenon is language specific and takes place at the phonological level. Moreover, it has been proposed that adaptation should only be observed for languages that are rhythmically similar. This assumption was explored by studying adaptation to different time-compressed languages in Spanish speakers. In Experiment 1, the performances of Spanishspeaking subjects who adapted to Spanish, Italian, French, English, and Japanese were compared. In Experiment 2, subjects from the same population were tested with Greek sentences compressed to two different rates. The results showed adaptation for Spanish, Italian, and Greek and no adaptation for English and Japanese, with French being an intermediate case. To account for the data, we propose that variables other than just the rhythmic properties of the languages, such as the vowel system and/or the lexical stress pattern, must be considered. The Greek data also support the view that phonological, rather than lexical, information is a determining factor in adaptation to compressed speech.The acoustic/phonetic characteristics ofspeech vary as a function ofspeaker, rate ofspeech, prosody, and so forth. Yet, when we process our native language, we are hardly ever aware of such variability; indeed, these variations are apparently dealt with automatically and effortlessly by the perceptual system. However, when processing artificially degraded speech or when listening to speakers with foreign accents, it is more difficult to make suitable adjustments. Schwab, Nusbaum, and Pisoni (1985) suggest that several sentences are required to adjust to synthetically generated speech. Anecdotally, listening to speech spoken with a foreign accent can also take some time before it becomes fully intelligible. What are the mechanisms responsible for such slow adjustments? Why are some adjustments easier than others? For instance, for a This research was supported by grants from the Human Frontier Science Program and the Spanish Ministerio de Educacion y Ciencia (Contract PB97-0997) and the Catalan Government (Grup de Recerca Consolidat 5120-UB-05). We thank T. Otake (Dokkyo University), K. Forster, and M. Garrett (both at the University of Arizona) for their help in preparing and recording the Japanese and English materials. A.C. is currently at the Psychology Department, Harvard University. Correspondence concerning this article should be addressed to N. Sebastian-Galles, Universitat de Barcelona, P. de la Vall d'Hebron 171,08035 Barcelona, Spain (e-mail: sebastia@psico.psi.ub.es).native speaker of English, English spoken with a Dutch accent seems far easier to understand than English spoken with a Japanese accent. Why?Several studies have shown that language representations in adults are, to some extent, language specific. Listeners behave as if they process speech sounds through the filter of phonemic categories of their maternal language and have difficult...

show abstract

“…The most commonly-used pitch shifting algorithms are focused on the spectral envelope preservation in order to achieve a natural transformation, modifying as slightly as possible the original timbre [4]. Although many different algorithms have been proposed, most of them are based on overlap-add techniques, like synchronized overlap-and-add (SOLA) [5], time domain-pitch synchronous overlap and add (TD-PSOLA) [6], frequency-domain PSOLA (FD-PSOLA) [7], waveform similarity based SOLA (WSOLA) [8], etc. These techniques consist of excising frames from the voice, processing them and recombining the resulting frames with an overlap-add (OLA) algorithm.…”

Section: Introductionmentioning

confidence: 99%

Spectral Envelope Transformation in Singing Voice for Advanced Pitch Shifting

et al. 2016

View full text Add to dashboard Cite

Abstract:The aim of the present work is to perform a step towards more natural pitch shifting techniques in singing voice for its application in music production and entertainment systems. In this paper, we present an advanced method to achieve natural modifications when applying a pitch shifting process to singing voice by modifying the spectral envelope of the audio excerpt. To this end, an all-pole model has been selected to model the spectral envelope, which is estimated using a constrained non-linear optimization. The analysis of the global variations of the spectral envelope was carried out by identifying changes of the parameters of the model along with the changes of the pitch. With the obtained spectral envelope transformation functions, we applied our pitch shifting scheme to some sustained vowels in order to compare results with the same transformation made by using the Flex Pitch plugin of Logic Pro X and pitch synchronous overlap and add technique (PSOLA). This comparison has been carried out by means of both an objective and a subjective evaluation. The latter was done with a survey open to volunteers on our website.

show abstract

Diphone synthesis using an overlap-add technique for speech waveforms concatenation

Cited by 107 publications

References 8 publications

Perceptual adjustment to time-compressed speech: A cross-linguistic study

Perceptual adjustment to time-compressed speech: A cross-linguistic study

Adaptation to time-compressed speech: Phonological determinants

Spectral Envelope Transformation in Singing Voice for Advanced Pitch Shifting

Contact Info

Product

Resources

About