2010 IEEE Spoken Language Technology Workshop 2010
DOI: 10.1109/slt.2010.5700870
|View full text |Cite
|
Sign up to set email alerts
|

Toward better crowdsourced transcription: Transcription of a year of the Let's Go Bus Information System data

Abstract: Transcription is typically a long and expensive process. In the last year, crowdsourcing through Amazon Mechanical Turk (MTurk) has emerged as a way to transcribe large amounts of speech. This paper presents a two-stage approach for the use of MTurk to transcribe one year of Let's Go Bus Information System data, corresponding to 156.74 hours (257,658 short utterances). This data was made available for the Spoken Dialog Challenge 2010 [1]While others have used a one stage approach, asking workers to label, for … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
28
0
1

Year Published

2012
2012
2017
2017

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 46 publications
(31 citation statements)
references
References 8 publications
(10 reference statements)
1
28
0
1
Order By: Relevance
“…We attribute this drop in accuracy to the inherently more complex nature of audio transcription tasks [57], where work environment specifics (such as device volume, headsets or other equipment) may play a role. This is exacerbated in the case of audios with poor quality (audio_poorQuality).…”
Section: Performance Across Ui Element Variationsmentioning
confidence: 99%
“…We attribute this drop in accuracy to the inherently more complex nature of audio transcription tasks [57], where work environment specifics (such as device volume, headsets or other equipment) may play a role. This is exacerbated in the case of audios with poor quality (audio_poorQuality).…”
Section: Performance Across Ui Element Variationsmentioning
confidence: 99%
“…Workers' confidence in their simplifications can also be used to exclude simplifications which were submitted with low confidence (using worker confidence as a quality control filter was explored by Parent and Eskenazi (2010)). Worker agreement can also be used to detect simplifications that are very different from those submitted by other workers.…”
Section: Evaluating Simplification Qualitymentioning
confidence: 99%
“…Recent studies demonstrate that transcriptions can be obtained for a fraction of the cost and processing time of conventional methods [5,1,6,2]. However, one of the major challenges connected with crowdsourcing is quality control [6,2], that is, ensuring that the transcriptions produced by non-expert contributors are accurate and complete. Several techniques for the control of the quality of crowdsourced transcriptions have been proposed.…”
Section: Relations To Prior Workmentioning
confidence: 99%
“…Some authors have developed a corrective workflow, whereby the same transcription is checked and iteratively refined by multiple contributors [11,4,2]. Parent and Eskenazi [6] employ an automatic quality control mechanism based on the concept of gold standard, whereby one utterance transcribed by an expert is inserted in each work unit and contributors' performance is evaluated in terms of how similar their transcriptions are to those produced by the experts.…”
Section: Relations To Prior Workmentioning
confidence: 99%