2022
DOI: 10.1017/s1351324922000213
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of text preprocessing methods

Abstract: Text preprocessing is not only an essential step to prepare the corpus for modeling but also a key area that directly affects the natural language processing (NLP) application results. For instance, precise tokenization increases the accuracy of part-of-speech (POS) tagging, and retaining multiword expressions improves reasoning and machine translation. The text corpus needs to be appropriately preprocessed before it is ready to serve as the input to computer models. The preprocessing requirements depend on bo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 27 publications
(18 citation statements)
references
References 391 publications
(458 reference statements)
0
8
0
Order By: Relevance
“…Two approaches to SA are (1) lexicon based and (2) machine learning based methods [4]. We note that appropriate methods for preprocessing depend on the approach used [2].…”
Section: A Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Two approaches to SA are (1) lexicon based and (2) machine learning based methods [4]. We note that appropriate methods for preprocessing depend on the approach used [2].…”
Section: A Related Workmentioning
confidence: 99%
“…The review has not been converted to all lower case yet, and the punctuation has not been removed or adjusted either. Caution is taken when removing punctuation from the corpus [2], as a reviewer's use of punctuation in an emotional context, e.g. repeated exclamation marks "!!!…”
Section: Order Of Processesmentioning
confidence: 99%
See 3 more Smart Citations