Two-stage speech/music classifier with decision smoothing and sharpening in the EVS codec

Malenovsky, Vladimir; Vaillancourt, Tommy; Wang, Zhe; Choo, Kihyun; Atti, Venkatraman

doi:10.1109/icassp.2015.7179067

Cited by 7 publications

(4 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where g is the gradient of the GMM approach, and g [−1] is initialized to the value of −y each frame. Finally, previous frames of varying sizes (0-7) are combined according to the characteristics of the signal to determine speech/music [10].…”

Section: Context-based Methodsmentioning

confidence: 99%

“…Recently, further improvements in speech/music classification problems have been achieved by adopting several machine learning techniques, such as the support vector machine (SVM) [6,7], Gaussian mixture model (GMM) [8], and deep belief network (DBN) [9] for the selectable mode vocoder (SMV) codec. The enhanced voice services (EVS) speech/music classifier, which is known as the 3rd-generation partnership project (3GPP) standard speech codec for the voice-over-LTE (VoLTE) network, is also based on GMM, but its features were calculated either at a current frame or as a moving average between those in the current and the previous frames [10]. The speech/music classifier uses a binary classification, but the diversity of music is greater than that of speech, and it can be generally said that it is a multiclass classification method, according to each musical genre.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Improvement of Speech/Music Classification for 3GPP EVS Based on LSTM

Kang

Lee

2018

Symmetry

View full text Add to dashboard Cite

The competition of speech recognition technology related to smartphones is now getting into full swing with the widespread internet of thing (IoT) devices. For robust speech recognition, it is necessary to detect speech signals in various acoustic environments. Speech/music classification that facilitates optimized signal processing from classification results has been extensively adapted as an essential part of various electronics applications, such as multi-rate audio codecs, automatic speech recognition, and multimedia document indexing. In this paper, we propose a new technique to improve robustness of a speech/music classifier for an enhanced voice service (EVS) codec adopted as a voice-over-LTE (VoLTE) speech codec using long short-term memory (LSTM). For effective speech/music classification, feature vectors implemented with the LSTM are chosen from the features of the EVS. To overcome the diversity of music data, a large scale of data is used for learning. Experiments show that LSTM-based speech/music classification provides better results than the conventional EVS speech/music classification algorithm in various conditions and types of speech/music data, especially at lower signal-to-noise ratio (SNR) than conventional EVS algorithm.

show abstract

Section: Context-based Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Improvement of Speech/Music Classification for 3GPP EVS Based on LSTM

Kang

Lee

2018

Symmetry

View full text Add to dashboard Cite

show abstract

“…where is the gradient of the GMM approach, and [−1] is initialized to the value of − each frame. Finally, previous frames of varying sizes (0~7) are combined according to the characteristics of the signal to determine speech/music [10].…”

Section: Context-based Methodsmentioning

confidence: 99%

“…= ⨀ −1 + ⨀ (10) ℎ = ⨀ ( ) (11) where W ij are the weight matrices, ⨀ is the point-wise product with the gate value, b j is the bias, ϕ(x) is the activation function, and σ(x) is the logistic sigmoid. As shown in Figure 3, LSTM units are gathered together to form layers and are connected each time step.…”

Section: Proposed Lstm-based Speech/music Classificationmentioning

confidence: 99%

Improvement of Speech/Music Classification for 3GPP EVS Based on LSTM

Kang¹,

Lee²

2018

Preprint

View full text Add to dashboard Cite

Speech/music classification that facilitates optimized signal processing from classification results has been extensively adapted as an essential part of various electronics applications, such as multi-rate audio codecs, automatic speech recognition, and multimedia document indexing. In this paper, a new technique to improve the robustness of speech/music classifier for 3GPP enhanced voice service (EVS) using long short-term memory (LSTM) is proposed. For effective speech/music classification, feature vectors implemented with the LSTM are chosen from the features of the EVS. Experiments show that LSTM-based speech/music classification produces better results than conventional EVS under a variety of conditions and types of speech/music data.

show abstract

Long-Term Multi-band Frequency-Domain Mean-Crossing Rate (FDMCR): A Novel Feature Extraction Algorithm for Speech/Music Discrimination

Kahrizi

Kabudian

2023

Circuits Syst Signal Process

View full text Add to dashboard Cite

Two-stage speech/music classifier with decision smoothing and sharpening in the EVS codec

Cited by 7 publications

References 4 publications

Improvement of Speech/Music Classification for 3GPP EVS Based on LSTM

Improvement of Speech/Music Classification for 3GPP EVS Based on LSTM

Improvement of Speech/Music Classification for 3GPP EVS Based on LSTM

Long-Term Multi-band Frequency-Domain Mean-Crossing Rate (FDMCR): A Novel Feature Extraction Algorithm for Speech/Music Discrimination

Contact Info

Product

Resources

About