“…Voice conversion has taken some major strides in terms of speech quality and speaker similarity. Various approaches have been proposed, such as Gaussian mixture model (GMM) [3,4,5], frequency warping approaches [6,7,8], exemplar based methods [9,10,11], and neural network based methods [12,13,14,15,16,17,18,19,20,21]. Recently, disentangling speaker and linguistic content representations based on deep learning for voice conversion [22,23,24,25] has received much attention.…”