2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8461732
|View full text |Cite
|
Sign up to set email alerts
|

F0 Estimation for DNN-Based Ultrasound Silent Speech Interfaces

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

5
44
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 21 publications
(49 citation statements)
references
References 12 publications
5
44
0
Order By: Relevance
“…The optimal parameters of the CNN architecture were calculated in an earlier hyperparameter optimization for the MGC-LSP target [33]. Note that using several consecutive images as input, or applying recurrent architectures can lead to better results [10,11,33], but here we did not apply these in order to test scenarios which are more suitable for real-time implementation. The cost function applied for the log(F0) and MGC-LSP regression task was the meansquared error (MSE), while for the V/UV classification we used cross-entropy.…”
Section: Dnn Training With the Baseline Vocodermentioning
confidence: 99%
See 4 more Smart Citations
“…The optimal parameters of the CNN architecture were calculated in an earlier hyperparameter optimization for the MGC-LSP target [33]. Note that using several consecutive images as input, or applying recurrent architectures can lead to better results [10,11,33], but here we did not apply these in order to test scenarios which are more suitable for real-time implementation. The cost function applied for the log(F0) and MGC-LSP regression task was the meansquared error (MSE), while for the V/UV classification we used cross-entropy.…”
Section: Dnn Training With the Baseline Vocodermentioning
confidence: 99%
“…This has the main idea of recording the soundless articulatory movement, and automatically generating speech from the movement information, while the subject is not producing any sound. For this automatic conversion task, typically electromagnetic articulography (EMA) [2,3,4,5], ultrasound tongue imaging (UTI) [6,7,8,9,10,11,12,13], permanent magnetic articulography (PMA) [14,15], surface electromyography (sEMG) [16,17,18], Non-Audible Murmur (NAM) [19] or video of the lip movements [7,20] are used.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations