Proceedings of the Workshop on Noisy User-Generated Text 2015
DOI: 10.18653/v1/w15-4319
|View full text |Cite
|
Sign up to set email alerts
|

Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition

Abstract: This paper presents the results of the two shared tasks associated with W-NUT 2015: (1) a text normalization task with 10 participants; and (2) a named entity tagging task with 8 participants. We outline the task, annotation process and dataset statistics, and provide a high-level overview of the participating systems for each shared task.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
153
0
3

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 162 publications
(161 citation statements)
references
References 27 publications
(24 reference statements)
1
153
0
3
Order By: Relevance
“…In 2015, the Workshop on Noisy User-generated Text (W-NUT) [4] Table 1 Named Entity Recognition and Linking challenges since 2013…”
Section: W-nutmentioning
confidence: 99%
“…In 2015, the Workshop on Noisy User-generated Text (W-NUT) [4] Table 1 Named Entity Recognition and Linking challenges since 2013…”
Section: W-nutmentioning
confidence: 99%
“…In CMC, items like emoticons have no corresponding standard form and require a special treatment when normalizing these texts. E.g., for the shared task of normalizing Twitter data (Baldwin et al, 2015) only all-alphanumeric tokens are normalized. This excludes tokens like =), :) and :-) from the normalization.…”
Section: Related Workmentioning
confidence: 99%
“…This is, however, negligible for standard texts: only 7% of the morphological words appearing more than once in the Tiger corpus (Brants et al, 2004), a corpus consisting of German newspaper texts, show variance, i.e., are realized by more than one type. 2 In non-standard texts, there is more variation: In the English Twitter texts used as the training data for the W-NUT 2015 shared task on normalization (Baldwin et al, 2015), 57% of the morphological words show variation. 3 This can be reduced to 16% by lowercasing every type.…”
Section: Related Workmentioning
confidence: 99%
“…These new challenges will push the state of the art in these speech processing tasks. The orthographic regularization shared task builds on other work on orthographic regularization in widely spoken languages (see, for example (Mohit et al, 2014;Rozovskaya et al, 2015;Baldwin et al, 2015) on social media text and Dale and Kilgariff (2011) on text produced by language learners), but pushes the frontiers of work in this area in several ways: While this proposed shared task has much in common with these previous shared tasks, endangered language text normalization poses additional interesting problems. In languages like English or Arabic, there is usually a single, established orthography in which almost all users have formal schooling and extensive digital corpora in this orthography that establish "correct" practices.…”
Section: Intellectual Merit: Research Interest Inmentioning
confidence: 99%