Models of dataset size, question design, and cross-language speech perception for speech crowdsourcing applications

Hasegawa‐Johnson, Mark; Cole, Jennifer; Jyothi, Preethi; Varshney, Lav R.

doi:10.1515/lp-2015-0012

Cited by 10 publications

(4 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The development of the speech tagging tools in turn relies on the knowledge on how human listeners -the users of TTS systems -perceive and categorise prominence [1,7]. Recent advances show that crowdsourcing methods enable to directly access human prominence judgments in a relatively short time [8,9,10].…”

Section: Introductionmentioning

confidence: 99%

Controlling Prominence Realisation in Parametric DNN-Based Speech Synthesis

Malisz

Berthelsen²,

Beskow

et al. 2017

Interspeech 2017

View full text Add to dashboard Cite

This work aims to improve text-to-speech synthesis for Wikipedia by advancing and implementing models of prosodic prominence. We propose a new system architecture with explicit prominence modeling and test the first component of the architecture. We automatically extract a phonetic feature related to prominence from the speech signal in the ARCTIC corpus. We then modify the label files and train an experimental TTS system based on the feature using Merlin, a statistical-parametric DNN-based engine. Test sentences with contrastive prominence on the word-level are synthesised and separate listening tests a) evaluating the level of prominence control in generated speech, and b) naturalness, are conducted. Our results show that the prominence feature-enhanced system successfully places prominence on the appropriate words and increases perceived naturalness relative to the baseline.

show abstract

Section: Introductionmentioning

confidence: 99%

Controlling Prominence Realisation in Parametric DNN-Based Speech Synthesis

Malisz

Berthelsen²,

Beskow

et al. 2017

Interspeech 2017

View full text Add to dashboard Cite

show abstract

“…Despite methodological debates surrounding the trade-off between experimental control and stylistic variation (Xu 2010;Wagner et al 2015 and references therein), an increasing number of speech scientist use crowdsourcing platforms to collect linguistic data (Hasegawa-Johnson, Cole, Jyothi, & Varshney 2015). This suggests that researchers are willing to trade an increased level of variability and lack of control with quick and convenient access to large amounts of data.…”

Section: Limits and Challenges Of This Approachmentioning

confidence: 99%

Towards a replication culture in phonetic research: Speech production research in the classroom

Roettger¹,

Baer-Henney

2019

PDA

View full text Add to dashboard Cite

Our understanding of human sound systems is increasingly shaped by experimental studies. What we can learn from a single study, however, is limited. It is of critical importance to evaluate and substantiate existing findings in the literature by directly replicating published studies. Our publication system, however, does not reward direct replications in the same way as it rewards novel discoveries. Consequently, there is a lack of incentives for researchers to spend resources on conducting replication studies, a situation that is particularly true for speech production experiments, which often require resourceful data collection procedures and recording environments. In order to sidestep this issue, we propose to run direct replication studies with our students in the classroom. This proposal offers an easy and inexpensive way to conduct large-scale replication studies and has valuable pedagogical advantages for our students. To illustrate the feasibility of this approach, we report on two classroom-based replication studies on incomplete neutralization, a speech phenomenon that has sparked many methodological debates in the past. We show that in our classroom studies, we not only replicated incomplete neutralization effects, but our studies yielded effect magnitudes comparable to laboratory experiments and meta analytical estimates. We discuss potential challenges to this approach and outline possible ways to help us substantiate our scientific record.

show abstract

“…As an illustrative example, consider the problem of mismatched crowdsourcing for speech transcription, which has garnered interest in the signal processing community [4,6,9,12,14,23]. Suppose the four possibilities for a velar stop consonant to transcribe are R = { , , , }.…”

Section: Con Dence Level Reportingmentioning

confidence: 99%

Does Confidence Reporting from the Crowd Benefit Crowdsourcing Performance?

Varshney

2017

Proceedings of the 2nd International Workshop on Social Sensing

View full text Add to dashboard Cite

We explore the design of an e ective crowdsourcing system for an M-ary classi cation task. Crowd workers complete simple binary microtasks whose results are aggregated to give the nal classi cation decision. We consider the scenario where the workers have a reject option so that they are allowed to skip microtasks when they are unable to or choose not to respond to binary microtasks. Additionally, the workers report quantized con dence levels when they are able to submit de nitive answers. We present an aggregation approach using a weighted majority voting rule, where each worker's response is assigned an optimized weight to maximize crowd's classi cation performance. We obtain a couterintuitive result that the classi cation performance does not bene t from workers reporting quantized con dence. erefore, the crowdsourcing system designer should employ the reject option without requiring con dence reporting. CCS CONCEPTS•Human-centered computing →Social network analysis;

show abstract

Models of dataset size, question design, and cross-language speech perception for speech crowdsourcing applications

Cited by 10 publications

References 38 publications

Controlling Prominence Realisation in Parametric DNN-Based Speech Synthesis

Controlling Prominence Realisation in Parametric DNN-Based Speech Synthesis

Towards a replication culture in phonetic research: Speech production research in the classroom

Does Confidence Reporting from the Crowd Benefit Crowdsourcing Performance?

Contact Info

Product

Resources

About