2012 IEEE Spoken Language Technology Workshop (SLT) 2012
DOI: 10.1109/slt.2012.6424200
|View full text |Cite
|
Sign up to set email alerts
|

Crowdsourcing the acquisition of natural language corpora: Methods and observations

Abstract: We study the opportunity for using crowdsourcing methods to acquire language corpora for use in natural language processing systems. Specifically, we empirically investigate three methods for eliciting natural language sentences that correspond to a given semantic form. The methods convey frame semantics to crowd workers by means of sentences, scenarios, and list-based descriptions. We discuss various performance measures of the crowdsourcing process, and analyze the semantic correctness, naturalness, and bias… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
43
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 39 publications
(44 citation statements)
references
References 8 publications
1
43
0
Order By: Relevance
“…Such a workload distribution was previously described in (Wang et al, 2012) as appropriate for a betweensubject design. Each batch corresponded to one of two conditions: the first batch contained only textual/logical MRs, and the second one used only pictorial MRs.…”
Section: Results: Collected Datamentioning
confidence: 99%
See 2 more Smart Citations
“…Such a workload distribution was previously described in (Wang et al, 2012) as appropriate for a betweensubject design. Each batch corresponded to one of two conditions: the first batch contained only textual/logical MRs, and the second one used only pictorial MRs.…”
Section: Results: Collected Datamentioning
confidence: 99%
“…For example, (Zaidan and Callison-Burch, 2011) showed that crowdsourcing can result in datasets of comparable quality to those created by professional translators given appropriate quality control methods. (Mairesse et al, 2010) demonstrate that crowd workers can produce NL descriptions from abstract MRs, a method which also has shown success in related NLP tasks, such as Spoken Dialogue Systems (Wang et al, 2012) or Semantic Parsing . However, when collecting corpora for training NLG systems, new challenges arise: (1) How to ensure the required high quality of the collected data?…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Crowdsourcing services such as Amazon Mechanical Turk have been used to collect paraphrase sets that serve as NLP benchmarks [8,9,24]. Essentially, workers worldwide are paid tiny amounts to paraphrase individual example sentences or concepts.…”
Section: Template Set Amplificationmentioning
confidence: 99%
“…Crowdsourcing is a very popular method for various natural language and speech processing tasks [9,10,11]. Examples include sentence translation from one language to another or gathering annotations on bilingual lexical entries [12,13], as well as paraphrasing applications [14,15].…”
Section: Introductionmentioning
confidence: 99%