2018
DOI: 10.1371/journal.pone.0194151
|View full text |Cite
|
Sign up to set email alerts
|

A hybrid technique for speech segregation and classification using a sophisticated deep neural network

Abstract: Recent research on speech segregation and music fingerprinting has led to improvements in speech segregation and music identification algorithms. Speech and music segregation generally involves the identification of music followed by speech segregation. However, music segregation becomes a challenging task in the presence of noise. This paper proposes a novel method of speech segregation for unlabelled stationary noisy audio signals using the deep belief network (DBN) model. The proposed method successfully se… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
18
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 28 publications
(18 citation statements)
references
References 30 publications
(22 reference statements)
0
18
0
Order By: Relevance
“…All audio was converted into mono, as done in various applications (e.g. Bergler et al, 2019;Qazi, Tabassam Nawaz, Rashid, & Habib, 2018;Stowell, Petrusková, Šálek, & Linhart, 2019). By crossreferencing the time intervals of each segment with the logged start and end times of known gibbon phrases, each segment was labelled as (a) a "presence", if its time interval completely contained the interval of at least one labelled phrase, (b) an "absence", if its time interval contained no part of any phrase, or (c) a "partial presence", if its time interval intersected but did not completely contain the interval of at least one labelled phrase ( Figure 2).…”
Section: Discussionmentioning
confidence: 99%
“…All audio was converted into mono, as done in various applications (e.g. Bergler et al, 2019;Qazi, Tabassam Nawaz, Rashid, & Habib, 2018;Stowell, Petrusková, Šálek, & Linhart, 2019). By crossreferencing the time intervals of each segment with the logged start and end times of known gibbon phrases, each segment was labelled as (a) a "presence", if its time interval completely contained the interval of at least one labelled phrase, (b) an "absence", if its time interval contained no part of any phrase, or (c) a "partial presence", if its time interval intersected but did not completely contain the interval of at least one labelled phrase ( Figure 2).…”
Section: Discussionmentioning
confidence: 99%
“…CNNs are variant of deep learning and substitute for auditory representations. Commonly combined with other machine learning techniques to derive DNNs for voice separation and identification [18,19]. The DNN and CNN are very similar but the difference between them is that CNN has additional features extracting layers.…”
Section: A Cnn Modelmentioning
confidence: 99%
“…These additional features extracting layers are used to generate input descriptions for sub-sequent levels to the DNNs in place of initially processed features. Each one of the input features consists of a part of a larger part of convolution and max-pooling units [19,20]. The following explains the basic concepts of CNN architecture: Convolutional layer.…”
Section: A Cnn Modelmentioning
confidence: 99%
“…The image features obtained by this method are capable of producing good retrieval performance and have the advantages of fast indexing and scalability. Z. Mehmood et al [27] proposed a new method for speech separation of unmarked static noise audio signals using the deep trusted network (DBN) model. The method can effectively separate the music signal from the noisy audio stream, remove the static noise using the hidden layer separation model of the recurrent neural network (RNN), and then use the dictionary-based Fisher algorithm for speech classification.…”
Section: Related Workmentioning
confidence: 99%