ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053172
|View full text |Cite
|
Sign up to set email alerts
|

Privacy Aware Acoustic Scene Synthesis Using Deep Spectral Feature Inversion

Abstract: Gathering information about the acoustic environment of urban areas is now possible and studied in many major cities in the world. Part of the research is to find ways to inform the citizen about its sound environment while ensuring her privacy. We study in this paper how this application can be cast into a feature inversion problem. We argue that considering deep learning techniques to solve this problem allows us to produce sound sketches that are representative and privacy aware. Experiments done considerin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 15 publications
(17 reference statements)
0
5
0
Order By: Relevance
“…A 512-dimensional x-vector encoding the speaker is extracted using a TDNN trained on VoxCeleb-1,2 with Kaldi. In Step 2, for every source x-vector, an anonymized x-vector is computed by finding the N farthest xvectors in an external pool (LibriTTS train-other-500) according to the PLDA distance, and by averaging N * randomly selected vectors among them 5 . In Step 3, an SS AM generates Mel-filterbank features given the anonymized x-vector and the F0+BN features, and a neural source-filter (NSF) waveform model [35] outputs a speech signal given the anonymized xvector, the F0, and the generated Mel-filterbank features.…”
Section: Anonymization Baselinesmentioning
confidence: 99%
See 1 more Smart Citation
“…A 512-dimensional x-vector encoding the speaker is extracted using a TDNN trained on VoxCeleb-1,2 with Kaldi. In Step 2, for every source x-vector, an anonymized x-vector is computed by finding the N farthest xvectors in an external pool (LibriTTS train-other-500) according to the PLDA distance, and by averaging N * randomly selected vectors among them 5 . In Step 3, an SS AM generates Mel-filterbank features given the anonymized x-vector and the F0+BN features, and a neural source-filter (NSF) waveform model [35] outputs a speech signal given the anonymized xvector, the F0, and the generated Mel-filterbank features.…”
Section: Anonymization Baselinesmentioning
confidence: 99%
“…Current methods fall into four categories: deletion, encryption, distributed learning, and anonymization. Deletion methods [4,5] are meant for ambient sound analysis. They delete or obfuscate any overlapping speech to the point where no information about it can be recovered.…”
Section: Introductionmentioning
confidence: 99%
“…Deletion techniques aim at ambient sound analysis. When recording sound in public places, speech is obfuscated such that no information can be recovered [101]. This can be seen as a similar technique to blurring all faces in video surveillance in public places.…”
Section: A Privacy Preservationmentioning
confidence: 99%
“…In recent years, some methods of environmental sound synthesis using deep learning approaches have been developed [1]- [3]. Environmental sound synthesis has great potential for many applications such as supporting movie and game production [2], [4], and data augmentation for sound event detection and scene classification [5], [6]. As one of the methods of environmental sound synthesis, environmental sound synthesis using sound event labels as input to the system [1] has been proposed.…”
Section: Introductionmentioning
confidence: 99%