Reversible speaker de-identification using pre-trained transformation functions

Magariños, Carmen; López-Otero, Paula; Docío-Fernández, Laura; Rodriguez-Banga, Eduardo; Erro, Daniel; Garcı́a-Mateo, Carmen

doi:10.1016/j.csl.2017.05.001

Cited by 32 publications

(25 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The second approach for speaker de-identification used in this work was first presented in [11], and it consists in using a bunch of FW + AS functions pre-trained on a multi-speaker database. Given…”

Section: Pre-trained Transformation Functionsmentioning

confidence: 99%

“…A plausible solution to this issue is the use of de-identification, which is a process by which a data custodian alters or removes identifying information from a dataset, making it harder for users of the data to determine the identities of the data subjects [6]. Speaker deidentification is usually carried out by either performing automatic speech recognition (ASR) followed by a text-to-speech (TTS) system [7] or applying voice conversion techniques [8][9][10][11]. The latter approach is more extended since it allows the recovery of the original signal and, in addition, it does not rely on the availability and performance of ASR and TTS modules for a given language.…”

Section: Introductionmentioning

confidence: 99%

“…For that purpose, speaker de-identification and depression detection algorithms are used in a series of experiments that aim at finding out whether the estimation of depression declines when dealing with de-identified speech. The assessed speaker de-identification approaches are suitable for these experiments since they can be straightforwardly applied to any speaker without having to train transformation functions between the input and target speakers [10,11]. Regarding the estimation of depression severity, an approach based on acoustic characteristics and i-vector representation combined with support vector regression is chosen [19] due to the characteristics of AVEC 2014 data and its acceptable results in this experimental framework.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Influence of speaker de‐identification in depression detection

López-Otero

Magariños

Docío-Fernández

et al. 2017

IET signal process.

Self Cite

View full text Add to dashboard Cite

Depression is a common mental disorder that is usually addressed by outpatient treatments that favour patients' inclusion in the society. This raises the need for tools to remotely monitor the emotional state of the patients, which can be carried out via telephone or the Internet using speech processing approaches. However, these strategies lead to privacy concerns caused by the transmission of the patients' speech and its subsequent storage in servers. The use of speech deidentification to protect the privacy of these patients seems straightforward, but the influence of this procedure in the manifestation of the disease in the patients' speech has not been addressed yet. Hence, this study evaluates the performance of an automatic depression level estimation system when dealing with original and de-identified speech, in order to analyse the influence of the de-identification procedure in the detection of depression. Two de-identification approaches based on voice transformation via frequency warping and amplitude scaling are assessed, which can be applied to any speaker without additional training. Experiments carried out in the framework of the audio/visual emotion challenge 2014 show that the proposed de-identification approaches achieve promising de-identification results at the expense of a slight degradation of depression detection.

show abstract

Section: Pre-trained Transformation Functionsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Influence of speaker de‐identification in depression detection

López-Otero

Magariños

Docío-Fernández

et al. 2017

IET signal process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The phase vocoder and the standard vocal tract length normalization were used to conceal the gender of the speaker [6] showing that for such a de-identification system the preceding gender recognition is necessary. Speaker de-identification gender conversion was investigated also in [7] where the spectral amplitude scaling was combined with the piecewise linear transformation and the linear modification of the fundamental frequency (F 0 ) giving 96.9% de-identification accuracy by the speaker identification in the i -vector space. The pre-calculated voice transformations based on GMM mapping and harmonic plus stochastic models with the target synthetic HMM-based voice were used for successful de-identification in the open set comparable to the closed-set de-identification 87.4% open-set vs 91% closed-set de-identification rate.…”

Section: Introductionmentioning

confidence: 99%

Evaluation of speaker de-identification based on voice gender and age conversion

Přibil

Přibilová

Matoušek

2018

Journal of Electrical Engineering

View full text Add to dashboard Cite

Two basic tasks are covered in this paper. The first one consists in the design and practical testing of a new method for voice de-identification that changes the apparent age and/or gender of a speaker by multi-segmental frequency scale transformation combined with prosody modification. The second task is aimed at verification of applicability of a classifier based on Gaussian mixture models (GMM) to detect the original Czech and Slovak speakers after applied voice deidentification. The performed experiments confirm functionality of the developed gender and age conversion for all selected types of de-identification which can be objectively evaluated by the GMM-based open-set classifier. The original speaker detection accuracy was compared also for sentences uttered by German and English speakers showing language independence of the proposed method.

show abstract

“…This issue can be overcome by means of deidentification, which is a process by which a data custodian alters or removes identifying information from a dataset, making it harder for users of the data to determine the identities of the data subjects [4]. The most extended technique for speaker deidentification consists in applying voice conversion techniques [5,6,7,8] in order to modify the voice characteristics of a speaker in a way that, afterwards, they sound like a different speaker.…”

Section: Introductionmentioning

confidence: 99%

Depression Detection Using Automatic Transcriptions of De-Identified Speech

López-Otero¹,

Docío-Fernández²,

Abad³

et al. 2017

Interspeech 2017

Self Cite

View full text Add to dashboard Cite

Depression is a mood disorder that is usually addressed by outpatient treatments in order to favour patient's inclusion in society. This leads to a need for novel automatic tools exploiting speech processing approaches that can help to monitor the emotional state of patients via telephone or the Internet. However, the transmission, processing and subsequent storage of such sensitive data raises several privacy concerns. Speech deidentification can be used to protect the patients' identity. Nevertheless, these techniques modify the speech signal, eventually affecting the performance of depression detection approaches based on either speech characteristics or automatic transcriptions. This paper presents a study on the influence of speech de-identification when using transcription-based approaches for depression detection. To this effect, a system based on the global vectors method for natural language processing is proposed. In contrast to previous works, two main sources of nuisance have been considered: the de-identification process itself and the transcription errors introduced by the automatic recognition of the patients' speech. Experimental validation on the DAIC-WOZ corpus reveals very promising results, obtaining only a slight performance degradation with respect to the use of manual transcriptions.

show abstract

Reversible speaker de-identification using pre-trained transformation functions

Cited by 32 publications

References 19 publications

Influence of speaker de‐identification in depression detection

Influence of speaker de‐identification in depression detection

Evaluation of speaker de-identification based on voice gender and age conversion

Depression Detection Using Automatic Transcriptions of De-Identified Speech

Contact Info

Product

Resources

About