Reconstruction of Normal Sounding Speech for Laryngectomy Patients Through a Modified CELP Codec

Sharifzadeh, Hamid; McLoughlin, Ian; Ahmadi, Farzaneh

doi:10.1109/tbme.2010.2053369

Cited by 67 publications

(42 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the other hand, the latter method is capable of significantly improving natural-ness by converting acoustic parameters of EL speech into those of natural voices using statistical VC techniques [5], [6]. The use of statistics extracted from a parallel data set consisting of EL speech and natural voices makes it possible to achieve more complex conversion processes than that of other signal processing approaches, such as formant manipulation [7]. For example, it is possible to convert from a spectral parameter sequence of EL speech into F 0 patterns of natural voices.…”

Section: Introductionmentioning

confidence: 99%

A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation

Tanaka

Toda

Neubig

et al. 2014

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYThis paper presents an electrolaryngeal (EL) speech enhancement method capable of significantly improving naturalness of EL speech while causing no degradation in its intelligibility. An electrolarynx is an external device that artificially generates excitation sounds to enable laryngectomees to produce EL speech. Although proficient laryngectomees can produce quite intelligible EL speech, it sounds very unnatural due to the mechanical excitation produced by the device. Moreover, the excitation sounds produced by the device often leak outside, adding to EL speech as noise. To address these issues, there are mainly two conventional approached to EL speech enhancement through either noise reduction or statistical voice conversion (VC). The former approach usually causes no degradation in intelligibility but yields only small improvements in naturalness as the mechanical excitation sounds remain essentially unchanged. On the other hand, the latter approach significantly improves naturalness of EL speech using spectral and excitation parameters of natural voices converted from acoustic parameters of EL speech, but it usually causes degradation in intelligibility owing to errors in conversion. We propose a hybrid approach using a noise reduction method for enhancing spectral parameters and statistical voice conversion method for predicting excitation parameters. Moreover, we further modify the prediction process of the excitation parameters to improve its prediction accuracy and reduce adverse effects caused by unvoiced/voiced prediction errors. The experimental results demonstrate the proposed method yields significant improvements in naturalness compared with EL speech while keeping intelligibility high enough.

show abstract

Section: Introductionmentioning

confidence: 99%

A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation

Tanaka

Toda

Neubig

et al. 2014

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

show abstract

“…Couple of methods are available for converting whispers to normal speech [19], [20], [21], [22], [23]. The driving idea of all these methods is based on the assumption of whispers are missing some acoustic and spectral features comparing with normal speech; hence, the problem of converting whispers to normal speech is formalised as a reconstruction issue [4], [24].…”

Section: Introductionmentioning

confidence: 99%

“…These reconstruction methods (either training-based or nontraining) have different disadvantages such as problems in converting continuous speech (due to using phoneme switching) [20], being computationally expensive (due to using highly overlapped frames for spectral enhancement, or using jump Markov linear system for pitch and voicing parameters) [19], [4], and more importantly lack of naturalness in regenerated output (due to simplified time alignment and spectral features assumptions) [21], [23]. In this paper, we focus on a trainingbased approach, and propose a novel reconstruction algorithm to improve the efficiency in phonated speech regeneration.…”

Section: Introductionmentioning

confidence: 99%

Phonated speech reconstruction using twin mapping models

Sharifzadeh

HajiRassouliha

McLoughlin

et al. 2015

2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)

Self Cite

View full text Add to dashboard Cite

Abstract-Computational speech reconstruction algorithms have the ultimate aim of returning natural sounding speech to aphonic and dysphonic individuals. These algorithms can also be used by unimpaired speakers for communicating sensitive or private information. When the glottis loses function due to disease or surgery, aphonic and dysphonic patients retain the power of vocal tract modulation to some degree but they are unable to speak anything more than hoarse whispers without prosthetic aid. While whispering can be seen as a natural and secondary aspect of speech communications for most people, it becomes the primary mechanism of communications for those who have impaired voice production mechanisms, such as laryngectomees.In this paper, by considering the current limitations of speech reconstruction methods, a novel algorithm for converting whispers to normal speech is proposed and the efficiency of the algorithm is discussed. The proposed algorithm relies upon twin mapping models and makes use of artificially generated whispers (called whisperised speech) to regenerate natural phonated speech from whispers. Through a training-based approach, the mapping models exploit whisperised speech to overcome frame to frame time alignment problem in the speech reconstruction process.

show abstract

“…However, whispering is usually too weak in volume to be practical in everyday conversation. To overcome these problems researchers have also attempted to capture whispered speech and re-synthesize normal speech externally [8]. However, this approach is sensitive to background noise in the environment.…”

Section: Introductionmentioning

confidence: 99%

Speech Driven by Artificial Larynx: Potential Advancement Using Synthetic Pitch Contours

Jian

2015

Universal Access in Human-Computer Interaction. Access to Learning, Health and Well-Being

View full text Add to dashboard Cite

Abstract. Despite a long history of development, the speech qualities achieved with artificial larynx devices are limited. This paper explores recent advances in prosodic speech processing and technology and assesses their potentials in improving the quality of speech with an artificial larynx -in particular, tone and intonation through pitch variation. Three approaches are discussed: manual pitch control, automatic pitch control and re-synthesized speech.

show abstract

Reconstruction of Normal Sounding Speech for Laryngectomy Patients Through a Modified CELP Codec

Cited by 67 publications

References 24 publications

A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation

A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation

Phonated speech reconstruction using twin mapping models

Speech Driven by Artificial Larynx: Potential Advancement Using Synthetic Pitch Contours

Contact Info

Product

Resources

About