ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9747421
|View full text |Cite
|
Sign up to set email alerts
|

Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemic Analysis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7

Relationship

2
5

Authors

Journals

citations
Cited by 18 publications
(14 citation statements)
references
References 18 publications
0
5
0
Order By: Relevance
“…In this work, we select ResNet-34 [47] with Squeeze-and-Excitation blocks [48], which is state-of-the-art network in sound event recognition tasks [21,49,50] and speaker recognition tasks [20,[51][52][53], in order to focus only on the custom sound events and customization methods. The detailed structure is described in Table II.…”
Section: Sound Event Recognition Network Architecturementioning
confidence: 99%
See 1 more Smart Citation
“…In this work, we select ResNet-34 [47] with Squeeze-and-Excitation blocks [48], which is state-of-the-art network in sound event recognition tasks [21,49,50] and speaker recognition tasks [20,[51][52][53], in order to focus only on the custom sound events and customization methods. The detailed structure is described in Table II.…”
Section: Sound Event Recognition Network Architecturementioning
confidence: 99%
“…CRNN with transformer [10,11] and conformer [12] widely used in automatic speech recognition achieved state-of-the-art performance in SED [13][14][15][16][17]. CRNN with frequency dynamic convolution, which is the content-adaptive model [18][19][20], improved SED performance by considering frequency dependencies as well as temporal dependencies [21]. In addition, data augmentation methods [22][23][24] improved not only performance but also robustness of SED model.…”
Section: Introductionmentioning
confidence: 99%
“…In the audio domain, recent developments of dynamic convolutions involve temporal dynamic convolutions (TDY) [44] and frequency dynamic convolutions (FDY) [45]. TDY dynamically adapts the filters along the time axis to consider time-varying characteristics of speech; FDY has been shown to improve sound event detection by dynamically adapting the filters along the frequency axis, addressing the fact that the frequency dimension is not shift-invariant.…”
Section: B Dynamic Cnn Componentsmentioning
confidence: 99%
“…These features mainly included the X-vector learned by a Time-Delay Neural Network (TDNN) [19]- [23] or an Emphasized Channel Attention, Propagation and Aggregation in TDNN (ECAPA-TDNN) [24]; the R-vector learned by a Residual Network with 34 layers (ResNet34) [25]; the S-vector learned by a Transformer [26]. In addition, other kinds of neural networks were adopted to learn deep embeddings [27]- [35], such as temporal dynamic convolutional neural network [31], Attentive Multi-scale Convolutional Recurrent Network (AMCRN) [33], Siamese neural network [34], and long short-term memory network [35].…”
Section: Related Workmentioning
confidence: 99%