2016
DOI: 10.1109/taslp.2016.2604566
|View full text |Cite
|
Sign up to set email alerts
|

Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding

Abstract: contact details: http://people.idiap.ch/mcernak. benefit from better adaptation properties and lower footprint. However, most current HMM-based VLBR systems have complex designs. The phonetic encoder -automatic speech recognition (ASR) -consists of acoustic HMMs and language models and an incremental search module. Similarly, the phonetic decoder requires acoustic HMMs, including a streaming/performative HMM-based speech synthesis system and an incremental speech vocoder.Our recent work [7] also focused on HMM… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
3

Relationship

4
5

Authors

Journals

citations
Cited by 25 publications
(13 citation statements)
references
References 57 publications
(59 reference statements)
0
13
0
Order By: Relevance
“…As far as we know, this is only the second published work to learn an audio compression pipeline end-to-end -the previous being an obscure early attempt by Morishima et al in 1990 [7] -and the first to compete with a contemporary standard. Cernak et al [8] proposed a nearly end-to-end design for a very-low-bitrate low-quality speech coder in 2016; however, their pipeline still required extraction of acoustic features and pitch (and was also quite complex, composing several different deep and spiking neural networks together). All other related designs we know of employ ANNs as a mere component of a larger hand-designed system.…”
Section: Introductionmentioning
confidence: 99%
“…As far as we know, this is only the second published work to learn an audio compression pipeline end-to-end -the previous being an obscure early attempt by Morishima et al in 1990 [7] -and the first to compete with a contemporary standard. Cernak et al [8] proposed a nearly end-to-end design for a very-low-bitrate low-quality speech coder in 2016; however, their pipeline still required extraction of acoustic features and pitch (and was also quite complex, composing several different deep and spiking neural networks together). All other related designs we know of employ ANNs as a mere component of a larger hand-designed system.…”
Section: Introductionmentioning
confidence: 99%
“…An example use case was shown on DNN-HMM speech recognition. Beyond ASR, other applications that rely on estimation of DNN posteriors can also benefit from the proposed approach, such as spoken query detection [26], parametric speech coding [27] and linguistic parsing [28]. Thorough experiments on large speech corpora for a broad range of applications is planned for future research.…”
Section: Discussionmentioning
confidence: 99%
“…In addition to the phonetic posteriors, our previous studies on phonological posteriors show that they conform to a small number of unique binary structures which are a tiny fraction of the number of permissible codes [12]. Exploiting this property enables construction of a small-size codebook for very low-bit rate speech coding [12,13]. More recently, we also found structured sparsity of phonological posteriors highly effective for classification of supra-segmental linguistic events [14].…”
Section: Introductionmentioning
confidence: 94%