2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
DOI: 10.1109/icassp.2015.7179067
|View full text |Cite
|
Sign up to set email alerts
|

Two-stage speech/music classifier with decision smoothing and sharpening in the EVS codec

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 4 publications
0
4
0
Order By: Relevance
“…where g is the gradient of the GMM approach, and g [−1] is initialized to the value of −y each frame. Finally, previous frames of varying sizes (0-7) are combined according to the characteristics of the signal to determine speech/music [10].…”
Section: Context-based Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…where g is the gradient of the GMM approach, and g [−1] is initialized to the value of −y each frame. Finally, previous frames of varying sizes (0-7) are combined according to the characteristics of the signal to determine speech/music [10].…”
Section: Context-based Methodsmentioning
confidence: 99%
“…Recently, further improvements in speech/music classification problems have been achieved by adopting several machine learning techniques, such as the support vector machine (SVM) [6,7], Gaussian mixture model (GMM) [8], and deep belief network (DBN) [9] for the selectable mode vocoder (SMV) codec. The enhanced voice services (EVS) speech/music classifier, which is known as the 3rd-generation partnership project (3GPP) standard speech codec for the voice-over-LTE (VoLTE) network, is also based on GMM, but its features were calculated either at a current frame or as a moving average between those in the current and the previous frames [10]. The speech/music classifier uses a binary classification, but the diversity of music is greater than that of speech, and it can be generally said that it is a multiclass classification method, according to each musical genre.…”
Section: Introductionmentioning
confidence: 99%
“…where is the gradient of the GMM approach, and [−1] is initialized to the value of − each frame. Finally, previous frames of varying sizes (0~7) are combined according to the characteristics of the signal to determine speech/music [10].…”
Section: Context-based Methodsmentioning
confidence: 99%
“…= ⨀ −1 + ⨀ (10) ℎ = ⨀ ( ) (11) where W ij are the weight matrices, ⨀ is the point-wise product with the gate value, b j is the bias, ϕ(x) is the activation function, and σ(x) is the logistic sigmoid. As shown in Figure 3, LSTM units are gathered together to form layers and are connected each time step.…”
Section: Proposed Lstm-based Speech/music Classificationmentioning
confidence: 99%