2016
DOI: 10.1109/taslp.2015.2496226
|View full text |Cite
|
Sign up to set email alerts
|

Study of Senone-Based Deep Neural Network Approaches for Spoken Language Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
49
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 79 publications
(53 citation statements)
references
References 29 publications
2
49
0
Order By: Relevance
“…Previous work has proven that the statistics of senones can be discriminative in languages [19,10]. We aim for a task-aware version which we name LID-senones, that are constructed from LID-features to make their statistics even more discriminative for LID.…”
Section: Lid-net Structurementioning
confidence: 99%
See 1 more Smart Citation
“…Previous work has proven that the statistics of senones can be discriminative in languages [19,10]. We aim for a task-aware version which we name LID-senones, that are constructed from LID-features to make their statistics even more discriminative for LID.…”
Section: Lid-net Structurementioning
confidence: 99%
“…DBFs are inherently robust for different speakers, channels and background noises. Lei et.al, Kenny et.al and Ferrer et.al [8,9,10] proposed collecting sufficient statistics also using structured DNNs to form an effective representation of underlying phonemes or phoneme states. It seems that DNNs are effective for either front-end frame-level feature extraction or back-end utterance-level modelling, where sufficient good quality and quantity training data is available.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, deep encoders have shown effectiveness in factor extraction through a feature learning process. Examples of these features include linguistic features for unsupervised acoustic unit discovery [1], affect-salient features for speech emotion recognition (SER) [2], noise-robust speaker embeddings for speaker recognition (SRE) [3] and phonetically-aware bottleneck features for language recognition (LRE) [4]. This motivates investigations on deep factorization of speech signal [5][6][7][8][9].…”
Section: Introductionmentioning
confidence: 99%
“…DBFs are inherently robust to phonotactically irrelevant information. Lei et.al, Kenny et.al and Ferrer et.al [7,8,9] proposed collecting sufficient statistics using a structured DNN to form effective representations from posteriors of phoneme or phoneme states. DNNs have been shown to excel when combined with phonotactic training in LID modelling, nevertheless both the DBFs and calculated statistics are extracted from phoneme or phoneme states, which are not always discriminative to languages.…”
Section: Introductionmentioning
confidence: 99%