Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management 2016
DOI: 10.5220/0006083004130418
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of Statistical Text Normalisation Techniques for Twitter

Abstract: Abstract:One of the major challenges in the era of big data use is how to 'clean' the vast amount of data, particularly from micro-blog websites like Twitter. Twitter messages, called tweets, are commonly written in ill-forms, including abbreviations, repeated characters, and misspelled words. These 'noisy tweets' require text normalisation techniques to detect and convert them into more accurate English sentences. There are several existing techniques proposed to solve these issues, however each technique pos… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(10 citation statements)
references
References 6 publications
0
10
0
Order By: Relevance
“…Table 5 lists the number of correct, incorrect, and non-normalized words. These same 300 words were then put through the normalization methods proposed by [ 12 , 39 ]. Both these models were able to use the regular expression method and a spell-check algorithm to normalize OOV words with repeated letters resulting in impressive outcomes.…”
Section: Experimental Evaluation and Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…Table 5 lists the number of correct, incorrect, and non-normalized words. These same 300 words were then put through the normalization methods proposed by [ 12 , 39 ]. Both these models were able to use the regular expression method and a spell-check algorithm to normalize OOV words with repeated letters resulting in impressive outcomes.…”
Section: Experimental Evaluation and Resultsmentioning
confidence: 99%
“…Both these models were able to use the regular expression method and a spell-check algorithm to normalize OOV words with repeated letters resulting in impressive outcomes. Table 5 provides a comparison of the outcomes of the RBPsWRL- Sym model and that of the normalization models proposed by [ 12 , 39 ]. As seen in Fig 20 , the RBPsWRL-Sym model increased the F1 score from 78% and 81% to 88%.…”
Section: Experimental Evaluation and Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Generally, misspelled words were detected by Natural Language Processing (NLP) systems using the mult-channel models which effectively find the lexical variance on some factors such as contextual wounding of the word, phonetic similarity, orthographic factors, and expansion of acronym using the standard dictionary. As suggested by previous researchers [20][21][22][23], they have utilized the Aspell spell corrector to detect the misspelling on Twitter as well as on SMS datasets.…”
Section: Related Workmentioning
confidence: 99%