2016
DOI: 10.1587/transinf.2015edl8168
|View full text |Cite
|
Sign up to set email alerts
|

DNN-Based Voice Activity Detection with Multi-Task Learning

Abstract: SUMMARY Recently, notable improvements in voice activity detection (VAD) problem have been achieved by adopting several machine learning techniques. Among them, the deep neural network (DNN) which learns the mapping between the noisy speech features and the corresponding voice activity status with its deep hidden structure has been one of the most popular techniques. In this letter, we propose a novel approach which enhances the robustness of DNN in mismatched noise conditions with multi-task learning (MTL) fr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 32 publications
(16 citation statements)
references
References 11 publications
(7 reference statements)
0
16
0
Order By: Relevance
“…e advantage is that the raster image system can provide a high rendering level and a more realistic head model. e disadvantage is that the time-varying motion parameters are difficult to calculate, and the raster image system is very expensive and animation renders a long time [7].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…e advantage is that the raster image system can provide a high rendering level and a more realistic head model. e disadvantage is that the time-varying motion parameters are difficult to calculate, and the raster image system is very expensive and animation renders a long time [7].…”
Section: Related Workmentioning
confidence: 99%
“…e structure of the attention mechanism in Tacotron's decoder is shown in Figure 5, and the specific mathematical operation process in the attention mechanism is shown in formulae (7) to (11).…”
Section: Statistical Parametric Speech Synthesis Based On Hidden Markov Chainmentioning
confidence: 99%
“…Finally, a multi-feature fusion classifier is established by support vector machine technology, which is used for the fatigue recognition of the driver's voice samples. Literature [11] proposed a fatigue detection method based on speech psychoacoustics, which uses the perceptual masking process in psychoacoustics to highlight the high-sensitive fatigue frequencies, and quantifies the abnormal sounds of fatigue in speech by masking the prosodic features extracted by psychoacoustic perception. Traditional research on speech signals focuses on finding information from feature engineering, such as the short-term energy of the speech signal, short-term average zero-crossing rate, pitch frequency, formant, Mel Frequency Cepstrum Coeefficient (MFCC), MFCC logarithmic power spectrum, speech rate, perceptual linear prediction coefficient (Perceptual Linear Prediction, PLP), amplitude perturbation, etc.…”
Section: Related Workmentioning
confidence: 99%
“…) Among them, the total number of frequency bands of training samples with index m is l, and the selection of weighting factors w(1,k) is flexible. This paper applies perceptually-based weighting factors to each frequency band according to formulas (11) and (12).…”
Section:  mentioning
confidence: 99%
“…Aiming at the problem of blind source separation, the literature [11] gave a relatively complete research framework, and by analyzing the separability and uncertainty in the blind source separation algorithm, it proposed a joint diagonalization method. The BP neural network algorithm proposed in the literature [12] has become one of the most widely used neural network models. The literature [13] established an objective function based on information maximization, and constructed a unified framework of ICA algorithm on the basis of information theory.…”
Section: Related Workmentioning
confidence: 99%