2011
DOI: 10.1016/j.sbspro.2011.10.577
|View full text |Cite
|
Sign up to set email alerts
|

Text Normalization in Social Media: Progress, Problems and Applications for a Pre-Processing System of Casual English

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
66
0
6

Year Published

2013
2013
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 114 publications
(73 citation statements)
references
References 2 publications
1
66
0
6
Order By: Relevance
“…These challenges are largely due to the inherent main social data characteristics (summarized in • Vast social data sizes: Demand scalable solutions, dealing with computational time complexities required by conventional text mining algorithms. Emerging clustering approaches should be considered to result in efficient social text data analysis, since data need to be processed at a limited amount of time [12,13] . Current parallel and distributed infrastructures are proposed to meet the scaling demands of the clustering algorithms, and already several parallel or distributed clustering approaches have been proposed reducing both the computational cost and the execution time [12,14,15] .…”
Section: Challenges In Crowdsourced Trend Analysismentioning
confidence: 99%
“…These challenges are largely due to the inherent main social data characteristics (summarized in • Vast social data sizes: Demand scalable solutions, dealing with computational time complexities required by conventional text mining algorithms. Emerging clustering approaches should be considered to result in efficient social text data analysis, since data need to be processed at a limited amount of time [12,13] . Current parallel and distributed infrastructures are proposed to meet the scaling demands of the clustering algorithms, and already several parallel or distributed clustering approaches have been proposed reducing both the computational cost and the execution time [12,14,15] .…”
Section: Challenges In Crowdsourced Trend Analysismentioning
confidence: 99%
“…Processing Social Media Text Finally, while English NLP for social media has attracted considerable attention recently (Clark and Araki, 2011;Gimpel et al, 2011;Gouws et al, 2011;Ritter et al, 2011;Derczynski et al, 2013), there has not been much work on Arabic yet. Darwish et al (2012) discuss NLP problems in retrieving Arabic microblogs (tweets).…”
Section: Related Workmentioning
confidence: 99%
“…While Sproat et al (2001) work with four different types of text (newspaper text, real estate ads, and servlist texts on the topics of palmtop computers and cooking recipes), most later papers deal either with SMS texts -see for instance Choudhury et al (2007), Kobus et al (2008), or Cook and Stevenson (2009) -or Twitter text -see for instance Clark and Araki (2011), Brody and Diakopoulos (2011), Foster et al (2011), Han and Baldwin (2011), Hassan and Menezes (2013 or Eisenstein (2013). Also, Liu et al (2012) have worked on both SMS and Twitter datasets.…”
Section: Text Normalization As a Taskmentioning
confidence: 99%
“…Zhu et al (2007)'s approach tackles the normalization task as a tagging problem, where the different normalization transformations required are performed on each token depending on its type (line break, space, punctuation, word and special). Clark (2003) and Clark and Araki (2011) use rule-based approaches while Henríquez Q. and Hernández (2009) approach the task using a statistical machine translation system trained on original texts and their semi-automatically corrected version. More recently, Hassan and Menezes (2013) have shown how a version of Markov Random Walks can be used to train a normalizer on a small manually corrected corpus -without any further labeling.…”
Section: Approaches To Text Normalizationmentioning
confidence: 99%