The paper focuses on usage of deep neural networks for converting a person’s voice to another person’s voice, analogous to a mimic. The work in this paper introduces the concept of neural networks and deploys multi-layer deep neural networks for building a framework for
voice conversion. The spectral Mel-Frequency Cepstral Coefficients (MFCCs) are converted using a 10-layer deep network while fundamental frequency (F0) conversion is accomplished by logarithmic Gaussian normalized transformation. MFCCs are subjected to inverse cepstral filtering
while changes in F0 are incorporated using Pitch Synchronous OverLap Add (PSOLA) algorithm for re-synthesis. The results obtained are compared using Mel Cepstral Distortion (MCD) for objective evaluation while ABX-listening test is conducted for subjective assessment. Maximum
improvement in MCD of 13.87% is obtained for female-to-male conversion while ABX-listening test indicates that female-to-male is closest to target with an agreement of 76.2%. The method achieves reasonably good performance compared to state-of-the-art using optimal resources and avoids requirement
of highly complex computations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.