“…Several approaches to protect speaker privacy are based on digital signal processing (DSP) methods [11], [12], [14], [15], [16], [17], [18], which modify instantaneous speech characteristics such as the pitch, spectral envelope, and time scaling. State-of-the-art anonymization approaches have borrowed ideas from neural speech conversion and synthesis, mainly focusing on disentangled latent representation learning [10], [19], [20], [21], [22], [23], [24], [25] via two hypotheses.…”