SUMMARYThis paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal speech is one of the alternative speaking methods for laryngectomees. Although it doesn't require any external devices, generated voices usually sound unnatural compared with normal speech. To improve the intelligibility and naturalness of esophageal speech, we propose a voice conversion method from esophageal speech into normal speech. A spectral parameter and excitation parameters of target normal speech are separately estimated from a spectral parameter of the esophageal speech based on Gaussian mixture models. The experimental results demonstrate that the proposed method yields significant improvements in intelligibility and naturalness. We also apply one-to-many eigenvoice conversion to esophageal speech enhancement to make it possible to flexibly control the voice quality of enhanced speech.
SUMMARYThe perceived age of a singing voice is the age of the singer as perceived by the listener, and is one of the notable characteristics that determines perceptions of a song. In this paper, we describe an investigation of acoustic features that have an effect on the perceived age, and a novel voice timbre control technique based on the perceived age for singing voice conversion (SVC). Singers can sing expressively by controlling prosody and voice timbre, but the varieties of voices that singers can produce are limited by physical constraints. Previous work has attempted to overcome this limitation through the use of statistical voice conversion. This technique makes it possible to convert singing voice timbre of an arbitrary source singer into those of an arbitrary target singer. However, it is still difficult to intuitively control singing voice characteristics by manipulating parameters corresponding to specific physical traits, such as gender and age. In this paper, we first perform an investigation of the factors that play a part in the listener's perception of the singer's age at first. Then, we applied a multiple-regression Gaussian mixture models (MR-GMM) to SVC for the purpose of controlling voice timbre based on the perceived age and we propose SVC based on the modified MR-GMM for manipulating the perceived age while maintaining singer's individuality. The experimental results show that 1) the perceived age of singing voices corresponds relatively well to the actual age of the singer, 2) prosodic features have a larger effect on the perceived age than spectral features, 3) the individuality of a singer is influenced more heavily by segmental features than prosodic features 4) the proposed voice timbre control method makes it possible to change the singer's perceived age while not having an adverse effect on the perceived individuality. key words: singing voice, voice conversion, perceived age, spectral and prosodic features, subjective evaluations
In this study, we evaluate O'ur proposed methods fO'r enhancing alaryngeal speech based O'n statistical vO'ice cO'nversiO'n techniques. VO'ice cO'nversiO'n based O'n a Gaussian mixture model has been ap plied tO' the cO'nversiO'n O'f a加yngeal speech intO' nO'rmal speech (AL-tO'司Speech). MO'reO'ver, O'ne-tO'-m組y eigenvO'ice cO'nversiO'n (EVC) has alsO' been applied tO' AL-tO'-Speech tO' enable the陀cO'very O'f the O'riginal vO'ice quality O'f laryngectO'mees even if O'nly O'ne arbitrary utterance O'f the O'riginal vO'ice is available. VCAEVC-based AL-tO'-Speech .systems have been develO'ped fO'r several types O'f a加yngeal speech, such as esO'phageal speech (ES), electrO'laryngeal speech (EL), and body-cO'nducted silent electrO'laryngeal speech (silent EL)ηlese proposed systems are cO'mpared with each O'ther from variO'us perspectives. The experimental results demO'nstrate that O'ur prO'posed systems yield signi白cant enhancement effects O'n each type O'f a凶yngeal speech. Index Terms-alaryngeal speech, speech enhancement, vO'ice cO'nversiO'n, eigenvO'ice cO'nversiO'n, perfO'rmance evaluatiO'ns
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.