Proceedings of the Sixth Workshop on Noisy User-Generated Text (W-Nut 2020) 2020
DOI: 10.18653/v1/2020.wnut-1.19
|View full text |Cite
|
Sign up to set email alerts
|

Truecasing German user-generated conversational text

Abstract: True-casing, the task of restoring proper case to (generally) lower case input, is important in downstream tasks and for screen display. In this paper, we investigate truecasing as an intrinsic task and present several experiments on noisy user queries to a voice-controlled dialog system. In particular, we compare a rulebased, an n-gram language model (LM) and a recurrent neural network (RNN) approaches, evaluating the results on a German Q&A corpus and reporting accuracy for different case categories. We show… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 10 publications
0
4
0
Order By: Relevance
“…[11] first introduced character-based LSTM for this task and completely solved the mixed case word problem. Recently, [2] compared character-based n-gram (n up to 15) language models with the character LSTM of [11]. [12] advanced the state of the art with a character-based CNN-LSTM-CRF model.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…[11] first introduced character-based LSTM for this task and completely solved the mixed case word problem. Recently, [2] compared character-based n-gram (n up to 15) language models with the character LSTM of [11]. [12] advanced the state of the art with a character-based CNN-LSTM-CRF model.…”
Section: Related Workmentioning
confidence: 99%
“…The vast amount of online text powers language models for speech recognition, typing suggestions and many other language generation tasks. However user-generated texts, especially those from mobile applications such as Twitter Tweets [1], often violate the grammatical rules of casing in English and other western languages [2]. The process of restoring the proper case, often known as tRuEcasIng [3], provides a factorized solution with a dedicated model for case normalization.…”
Section: Introductionmentioning
confidence: 99%
“…Susanto et al (2016) first introduced character-based LSTM for this task and completely solved the mixed case word problem. Recently, Grishina et al (2020) compared character-based n-gram (n up to 15) language models with the character LSTM of Susanto et al (2016). Ramena et al (2020) advanced the state of the art with a characterbased CNN-LSTM-CRF model which introduced local output label dependencies.…”
Section: Related Workmentioning
confidence: 99%
“…Automatically generated texts such as speech recognition (ASR) transcripts as well as user-generated texts from mobile applications such as Twitter Tweets (Nebhi et al, 2015) often violate the grammatical rules of casing in English and other western languages (Grishina et al, 2020). The process of restoring the proper case, often known as tRuEcas-Ing (Lita et al, 2003), is not only important for the ease of consumption by end-users (e.g.…”
Section: Introductionmentioning
confidence: 99%