An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech

Verhelst, Werner; Roelands, Marc

doi:10.1109/icassp.1993.319366

Cited by 279 publications

(183 citation statements)

References 4 publications

Supporting

Mentioning

178

Contrasting

Unclassified

Order By: Relevance

“…시간 영역에서의 시간축 변환 기술의 대표적인 예 로는 Synchronized OverLap and Add(SOLA), [1] OverLapAdd technique based on Waveform Similarity(WSOLA), [2] Pitch Synchronized OverLap and Add(PSOLA) [3] [5,6] 참고문헌…”

Section: 임상준 김형순unclassified

A Fast Normalized Cross-Correlation Computation for WSOLA-based Speech Time-Scale Modification

Lim¹,

Kim²

2012

The Journal of the Acoustical Society of Korea

View full text Add to dashboard Cite

ABSTRACT:The overlap-add technique based on waveform similarity (WSOLA) method is known to be an efficient high-quality algorithm for time scaling of speech signal. The computational load of WSOLA is concentrated on the repeated normalized cross-correlation (NCC) calculation to evaluate the similarity between two signal waveforms. To reduce the computational complexity of WSOLA, this paper proposes a fast NCC computation method, in which NCC is obtained through pre-calculated sum tables to eliminate redundancy of repeated NCC calculations in the adjacent regions. While the denominator part of NCC has much redundancy irrespective of the time-scale factor, the numerator part of NCC has less redundancy and the amount of redundancy is dependent on both the time-scale factor and optimal shift value, thereby requiring more sophisticated algorithm for fast computation. The simulation results show that the proposed method reduces about 40%, 47% and 52% of the WSOLA execution time for the time-scale compression, 2 and 3 times time-scale expansions, respectively, while maintaining exactly the same speech quality of the conventional WSOLA.

show abstract

Section: 임상준 김형순unclassified

A Fast Normalized Cross-Correlation Computation for WSOLA-based Speech Time-Scale Modification

Lim¹,

Kim²

2012

The Journal of the Acoustical Society of Korea

View full text Add to dashboard Cite

show abstract

“…That is, the original analysis phases are kept during synthesis for the transient bins. Subsequently, as the analysis window slides over the transient, the same gain reduction is applied for the transient bins, as during the onset of the transient (16). The bins are retained in the set of transient bins until their transientness decays to a value smaller than 0.5, or until the analysis frame slides completely away from the detected transient center.…”

Section: Transient Preservationmentioning

confidence: 99%

“…The main challenge in TSM is in simultaneously preserving the subjective quality of these distinct components. Standard time-domain TSM methods, such as the synchronized overlap-add (SOLA) [15], the waveform-similarity overlap-add [16], and the pitch-synchronous overlap-add [17] techniques, are considered to provide high-quality TSM for quasi-harmonic signals. When these methods are applied to polyphonic signals, however, only the most dominant periodic pattern of the input waveform is preserved, while other periodic components suffer from phase jump artifacts at the synthesis frame boundaries.…”

Section: Introductionmentioning

confidence: 99%

Audio Time Stretching Using Fuzzy Classification of Spectral Bins

Damskägg

Välimäki

2017

Applied Sciences

View full text Add to dashboard Cite

A novel method for audio time stretching has been developed. In time stretching, the audio signal's duration is expanded, whereas its frequency content remains unchanged. The proposed time stretching method employs the new concept of fuzzy classification of time-frequency points, or bins, in the spectrogram of the signal. Each time-frequency bin is assigned, using a continuous membership function, to three signal classes: tonalness, noisiness, and transientness. The method does not require the signal to be explicitly decomposed into different components, but instead, the computing of phase propagation, which is required for time stretching, is handled differently in each time-frequency point according to the fuzzy membership values. The new method is compared with three previous time-stretching methods by means of a listening test. The test results show that the proposed method yields slightly better sound quality for large stretching factors as compared to a state-of-the-art algorithm, and practically the same quality as a commercial algorithm. The sound quality of all tested methods is dependent on the audio signal type. According to this study, the proposed method performs well on music signals consisting of mixed tonal, noisy, and transient components, such as singing, techno music, and a jazz recording containing vocals. It performs less well on music containing only noisy and transient sounds, such as a drum solo. The proposed method is applicable to the high-quality time stretching of a wide variety of music signals.

show abstract

“…The existing discontinuity between the successive segments and the misaligned interpolation of them result to artefacts and distortions that are detrimental to speech quality. Waveform similarity OLA (WSOLA) [23], on the other hand, searches for a position that has a maximal local similarity (e.g., maximise the cross-correlation function) with the last…”

Section: Figmentioning

confidence: 99%

A Novel Mobility-Aware Playout Algorithm for VoIP Services

Lykourgiotis

Kotsopoulos

Dagiuklas

2017

Wireless Pers Commun

View full text Add to dashboard Cite

The latest explosive growth in mobile networks has resulted in an increasing interest in optimisation techniques for mobile services. Resent advances in mobile wireless networks incorporate link-layer intelligence in order to enhance the performance of network and application layers. The Media Independent Handover (MIH) standard provides a framework that can make such link-layer intelligence available to upper layers. In this paper, a novel MIH-enabled playout algorithm for Voice over Internet Protocol applications is presented that aims to compensate for the degradation of voice quality caused by handovers. To that end, link-layer triggers in conjunction with speech time-scale modification techniques are exploited to mitigate the increase in delay and jitter induced by the handover process. Results of subjective listening tests show typical gains of 0.3 on a 5-point scale of the Mean Opinion Score with respect to existing playout scheduling schemes. AQ1

show abstract

An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech

Cited by 279 publications

References 4 publications

A Fast Normalized Cross-Correlation Computation for WSOLA-based Speech Time-Scale Modification

A Fast Normalized Cross-Correlation Computation for WSOLA-based Speech Time-Scale Modification

Audio Time Stretching Using Fuzzy Classification of Spectral Bins

A Novel Mobility-Aware Playout Algorithm for VoIP Services

Contact Info

Product

Resources

About