RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning 2017
DOI: 10.26615/978-954-452-049-6_086
|View full text |Cite
|
Sign up to set email alerts
|

Identifying the Authors’ National Variety of English in Social Media Texts

Abstract: In this paper, we present a study for the identification of authors' national variety of English in texts from social media. In data from Facebook and Twitter, information about the author's social profile is annotated, and the national English variety (US, UK, AUS, CAN, NNS) that each author uses is attributed. We tested four feature types: formal linguistic features, POS features, lexicon-based features related to the different varieties, and databased features from each English variety. We used various mach… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 14 publications
(19 citation statements)
references
References 26 publications
0
17
0
Order By: Relevance
“…Elfardy and Diab (2013) used emoticons (or emojis) in Arabic dialect identification with Naive Bayes ("NB"; see Section 6.5). Non-alphabetic characters have also been used by Basile et al (2017), Bestgen (2017), Samih (2017), and Simaki et al (2017). Henrich (1989) used knowledge of alphabets to exclude languages where a language-unique character in a test document did not appear.…”
Section: Charactersmentioning
confidence: 99%
See 4 more Smart Citations
“…Elfardy and Diab (2013) used emoticons (or emojis) in Arabic dialect identification with Naive Bayes ("NB"; see Section 6.5). Non-alphabetic characters have also been used by Basile et al (2017), Bestgen (2017), Samih (2017), and Simaki et al (2017). Henrich (1989) used knowledge of alphabets to exclude languages where a language-unique character in a test document did not appear.…”
Section: Charactersmentioning
confidence: 99%
“…Capitalization Capitalization is mostly preserved when calculating character n-gram frequencies, but in contexts where it is possible to identify the orthography of a given document and where capitalization exists in the orthography, lowercasing can be used to reduce sparseness. In recent LI work, capitalization was used as a special feature by Basile et al (2017), Bestgen (2017), and Simaki et al (2017).…”
Section: Alphabetsmentioning
confidence: 99%
See 3 more Smart Citations