2018
DOI: 10.1371/journal.pone.0193521
|View full text |Cite
|
Sign up to set email alerts
|

Quality prediction of synthesized speech based on tensor structured EEG signals

Abstract: This study investigates quality prediction methods for synthesized speech using EEG. Training a predictive model using EEG is challenging due to a small number of training trials, a low signal-to-noise ratio, and a high correlation among independent variables. When a predictive model is trained with a machine learning algorithm, the features extracted from multi-channel EEG signals are usually organized as a vector and their structures are ignored even though they are highly structured signals. This study pred… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 48 publications
(42 reference statements)
0
3
0
Order By: Relevance
“…This study also found which frequency band is useful in order to reduce the complexity of models which will shorten the processing time. In comparison with [9], our study tried to generalize the approach across the subjects while the previous work was done within subject. Therefore, our approach may reduce the prediction performance.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…This study also found which frequency band is useful in order to reduce the complexity of models which will shorten the processing time. In comparison with [9], our study tried to generalize the approach across the subjects while the previous work was done within subject. Therefore, our approach may reduce the prediction performance.…”
Section: Discussionmentioning
confidence: 99%
“…In [8], they proposed brain computer interface-based equation to predict quality of experience MOS, and achieved 1.00 of root mean squared error (RMSE) between actual and predicted MOS. In addition, by using tensor representation of all channels and all frequency bands, a study conducted by [9] shows that EEG signals could be used to predict MOS, valence, and arousal within the same subject. We also previously examined which EEG electrodes, frequency bands, and time length significantly represent perceived speech quality in Japanese using the generalized fisher scores [10].…”
Section: Introductionmentioning
confidence: 99%
“…Apart from the naturalness and understandability of contents, listening tests can also measure the distinguishability of characters or the degree of entertainment [3]. The subjective scales for rating the synthesized speech may include only a few scored parameters, such as an overall impression by a mean opinion score (MOS) describing the perceived speech quality from poor to excellent, a valence from negative to positive, and an arousal from unexcited to excited [4]. The MOS scale can be used not only for naturalness, but for different dimensions, such as affect (from negative to positive) or speaking style (from irritated to calm) as well [5].…”
Section: Introductionmentioning
confidence: 99%