2020
DOI: 10.1007/s10044-020-00921-5
|View full text |Cite
|
Sign up to set email alerts
|

Multi-channel spectrograms for speech processing applications using deep learning methods

Abstract: Time–frequency representations of the speech signals provide dynamic information about how the frequency component changes with time. In order to process this information, deep learning models with convolution layers can be used to obtain feature maps. In many speech processing applications, the time–frequency representations are obtained by applying the short-time Fourier transform and using single-channel input tensors to feed the models. However, this may limit the potential of convolutional networks to lea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
27
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 54 publications
(27 citation statements)
references
References 27 publications
0
27
0
Order By: Relevance
“…The experimental results show that MGANet is superior to the most advanced baseline. In future work, it could be expanded to emotion recognition, speech processing, non-stationary signal analysis, and other fields [ 28 , 29 , 30 ].…”
Section: Related Workmentioning
confidence: 99%
“…The experimental results show that MGANet is superior to the most advanced baseline. In future work, it could be expanded to emotion recognition, speech processing, non-stationary signal analysis, and other fields [ 28 , 29 , 30 ].…”
Section: Related Workmentioning
confidence: 99%
“…We are expecting that each branch will become an expert at distinguishing sound from a certain string. e characteristics recovered by a common convolutional layer will be used by all six branches [30][31][32]. is can be better understood by looking at the summarized plot of our model given in Figure 3.…”
Section: Model Architecturementioning
confidence: 99%
“…Recently, researchers have attracted significant attention to Deep Learning (DL) [11][12][13][14] owing to its numerous applications in speech processing [15], natural language processing [16], and CV [17,18]. In video recognition [19] and large-scale images, a model of DL so-called convolutional neural network (CNN) has lately attained several encouraging results.…”
Section: Introductionmentioning
confidence: 99%