2015
DOI: 10.1109/taslp.2015.2418577
|View full text |Cite
|
Sign up to set email alerts
|

Incremental Syllable-Context Phonetic Vocoding

Abstract: Abstract-Current very low bit rate speech coders are, due to complexity limitations, designed to work off-line. This paper investigates incremental speech coding that operates real-time and incrementally (i.e., encoded speech depends only on alreadyuttered speech without the need of future speech information). Since human speech communication is asynchronous (i.e., different information flows being simultaneously processed), we hypothesised that such an incremental speech coder should also operate asynchronous… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
10
0

Year Published

2015
2015
2018
2018

Publication Types

Select...
4
2

Relationship

5
1

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 37 publications
0
10
0
Order By: Relevance
“…Rather, we move the compression one level above, from the acoustics to the phonetic and phonological representation of the speech signal. Our previous work [7] showed that, if such a representation is inferred from the acoustics well, we could decrease the transmission rate to about 200 bit/s, however the speech quality did not achieve the quality of old LPC coders operating at 500 bit/s or less, that is, slightly lower quality than normal LPC coding at 2 kbit/s. In this work, we hypothesise that recent advances in the convergence of deep learning and speech technology could fill the gap, i.e., improving the speech quality of VLBR coding whilst still using a phonetic and phonological speech representation.…”
Section: B Coding Of Phonetic and Phonological Informationmentioning
confidence: 94%
See 2 more Smart Citations
“…Rather, we move the compression one level above, from the acoustics to the phonetic and phonological representation of the speech signal. Our previous work [7] showed that, if such a representation is inferred from the acoustics well, we could decrease the transmission rate to about 200 bit/s, however the speech quality did not achieve the quality of old LPC coders operating at 500 bit/s or less, that is, slightly lower quality than normal LPC coding at 2 kbit/s. In this work, we hypothesise that recent advances in the convergence of deep learning and speech technology could fill the gap, i.e., improving the speech quality of VLBR coding whilst still using a phonetic and phonological speech representation.…”
Section: B Coding Of Phonetic and Phonological Informationmentioning
confidence: 94%
“…Effective encoding of the F0 signal can be realised by curve fitting done on a syllable level. We thus propose to encode the continuous F0 signal using the discrete (Legendre) orthogonal polynomial (DLOP), as in [7]. To estimate syllable boundaries from the speech signal, a neuromorphic oscillatory device is used, based on modelling brain neural oscillations at syllable frequency.…”
Section: Coding Of Prosodic Informationmentioning
confidence: 99%
See 1 more Smart Citation
“…Leong et al (2014) show that phase relations between the phonetic and syllabic amplitude modulations, known as hierarchical phase locking and nesting or synchronization across different temporal granularity (Lakatos et al, 2005), is a good indication of the syllable stress. Intelligible speech representation with stress and accent information can be constructed by asynchronous fusion of phonetic and syllabic information (Cernak et al, 2015a).…”
Section: Cognitive Neuroscience Evidencementioning
confidence: 99%
“…By contrast to conventional syllabification models that work offline, humans "encode" speech in an incremental fashion, i.e., encoded speech does not depend on future temporal context (similar to causality in digital signal processing theory) (Levelt, 1993). We are therefore interested in an incremental syllabification method that can be directly applied to incremental speech processing systems such as in Cernak et al (2015). We hypothesise that a biologically plausible method would fulfill this requirement.…”
mentioning
confidence: 99%