Toward better crowdsourced transcription: Transcription of a year of the Let's Go Bus Information System data

Parent, Gabriel; Eskénazi, Maxine

doi:10.1109/slt.2010.5700870

Cited by 46 publications

(31 citation statements)

References 8 publications

(10 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We attribute this drop in accuracy to the inherently more complex nature of audio transcription tasks [57], where work environment specifics (such as device volume, headsets or other equipment) may play a role. This is exacerbated in the case of audios with poor quality (audio_poorQuality).…”

Section: Performance Across Ui Element Variationsmentioning

confidence: 99%

Modus Operandi of Crowd Workers

Gadiraju

Checco

Gupta

et al. 2017

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

View full text Add to dashboard Cite

The ubiquity of the Internet and the widespread proliferation of electronic devices has resulted in flourishing microtask crowdsourcing marketplaces, such as Amazon MTurk. An aspect that has remained largely invisible in microtask crowdsourcing is that of work environments; defined as the hardware and software affordances at the disposal of crowd workers which are used to complete microtasks on crowdsourcing platforms. In this paper, we reveal the significant role of work environments in the shaping of crowd work. First, through a pilot study surveying the good and bad experiences workers had with UI elements in crowd work, we revealed the typical issues workers face. Based on these findings, we then deployed over 100 distinct microtasks on CrowdFlower, addressing workers in India and USA in two identical batches. These tasks emulate the good and bad UI element designs that characterize crowdsourcing microtasks. We recorded hardware specifics such as CPU speed and device type, apart from software specifics including the browsers used to complete tasks, operating systems on the device, and other properties that define the work environments of crowd workers. Our findings indicate that crowd workers are embedded in a variety of work environments which influence the quality of work produced. To confirm and validate our data-driven findings we then carried out semi-structured interviews with a sample of Indian and American crowd workers from this platform. Depending on the design of UI elements in microtasks, we found that some work environments support crowd workers more than others. Based on our overall findings resulting from all the three studies, we introduce ModOp, a tool that helps to design crowdsourcing microtasks that are suitable for diverse crowd work environments. We empirically show that the use of ModOp results in reducing the cognitive load of workers, thereby improving their user experience without effecting the accuracy or task completion time.

show abstract

Section: Performance Across Ui Element Variationsmentioning

confidence: 99%

Modus Operandi of Crowd Workers

Gadiraju

Checco

Gupta

et al. 2017

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

View full text Add to dashboard Cite

show abstract

“…Workers' confidence in their simplifications can also be used to exclude simplifications which were submitted with low confidence (using worker confidence as a quality control filter was explored by Parent and Eskenazi (2010)). Worker agreement can also be used to detect simplifications that are very different from those submitted by other workers.…”

Section: Evaluating Simplification Qualitymentioning

confidence: 99%

An Open Corpus of Everyday Documents for Simplification Tasks

Pellow

Eskénazi

2014

Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)

Self Cite

View full text Add to dashboard Cite

In recent years interest in creating statistical automated text simplification systems has increased. Many of these systems have used parallel corpora of articles taken from Wikipedia and Simple Wikipedia or from Simple Wikipedia revision histories and generate Simple Wikipedia articles. In this work we motivate the need to construct a large, accessible corpus of everyday documents along with their simplifications for the development and evaluation of simplification systems that make everyday documents more accessible. We present a detailed description of what this corpus will look like and the basic corpus of everyday documents we have already collected. This latter contains everyday documents from many domains including driver's licensing, government aid and banking. It contains a total of over 120,000 sentences. We describe our preliminary work evaluating the feasibility of using crowdsourcing to generate simplifications for these documents. This is the basis for our future extended corpus which will be available to the community of researchers interested in simplification of everyday documents.

show abstract

“…Recent studies demonstrate that transcriptions can be obtained for a fraction of the cost and processing time of conventional methods [5,1,6,2]. However, one of the major challenges connected with crowdsourcing is quality control [6,2], that is, ensuring that the transcriptions produced by non-expert contributors are accurate and complete. Several techniques for the control of the quality of crowdsourced transcriptions have been proposed.…”

Section: Relations To Prior Workmentioning

confidence: 99%

“…Some authors have developed a corrective workflow, whereby the same transcription is checked and iteratively refined by multiple contributors [11,4,2]. Parent and Eskenazi [6] employ an automatic quality control mechanism based on the concept of gold standard, whereby one utterance transcribed by an expert is inserted in each work unit and contributors' performance is evaluated in terms of how similar their transcriptions are to those produced by the experts.…”

Section: Relations To Prior Workmentioning

confidence: 99%

Comparing two methods for crowdsourcing speech transcription

Sprugnoli¹,

Moretti²,

Fuoli³

et al. 2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

This paper presents the results of an experimental study conducted with the aim of comparing two methods for crowdsourcing speech transcription that incorporate two different quality control mechanisms (i.e. explicit versus implicit) and that are based on two different processes (i.e. parallel versus iterative). In the Gold Standard method the same speech segment is transcribed in parallel by multiple contributors whose reliability is checked with respect to some reference transcriptions provided by experts. On the other hand, in the Dual Pathway method two independent groups of contributors work on the same set of transcriptions refining them in an iterative way until they converge, and thus eliminating the need to have reference transcriptions and to check transcription quality in a separate phase. These two methods were tested on about half an hour of broadcast news speech and for two different European languages, namely German and Italian. Both methods obtained good results in terms of Word Error Rate (WER) and compare well with the word disagreement rate of experts on the same data.

show abstract

Toward better crowdsourced transcription: Transcription of a year of the Let's Go Bus Information System data

Cited by 46 publications

References 8 publications

Modus Operandi of Crowd Workers

Modus Operandi of Crowd Workers

An Open Corpus of Everyday Documents for Simplification Tasks

Comparing two methods for crowdsourcing speech transcription

Contact Info

Product

Resources

About