2016 5th Brazilian Conference on Intelligent Systems (BRACIS) 2016
DOI: 10.1109/bracis.2016.056
|View full text |Cite
|
Sign up to set email alerts
|

Discriminating between Brazilian and European Portuguese National Varieties on Twitter Texts

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 11 publications
0
9
0
Order By: Relevance
“…Lately Samih and Maier (2016) also used Kneser-Ney smoothing. Castro et al (2016Castro et al ( , 2017 evaluated several smoothing techniques with character and word n-grams: Laplace/Lidstone, Witten-Bell, Good-Turing, and Kneser-Ney. In their evaluations, additive smoothing with 0.1 provided the best results.…”
Section: Good-turing Discountingmentioning
confidence: 99%
“…Lately Samih and Maier (2016) also used Kneser-Ney smoothing. Castro et al (2016Castro et al ( , 2017 evaluated several smoothing techniques with character and word n-grams: Laplace/Lidstone, Witten-Bell, Good-Turing, and Kneser-Ney. In their evaluations, additive smoothing with 0.1 provided the best results.…”
Section: Good-turing Discountingmentioning
confidence: 99%
“…The fourth edition of the DSL shared task was motivated by the success of the previous editions and by the growing interest of the research community in the identification of dialects and similar languages, as evidenced by recent publications (Xu et al, 2016;Radford and Gallé, 2016;Castro et al, 2016). We also saw the number of system submissions to the DSL challenge grow from 8 in 2014 to 10 in 2015 and then to 17 in 2016.…”
Section: Discriminating Between Similar Languages (Dsl)mentioning
confidence: 99%
“…They reported variety identification accuracies of 99.6%, 91.2% and 99.8% with word unigrams, word bigrams and character 4 g, respectively. Also in Portuguese, Castro et al (2016) combined character 6 g with word unigrams and bigrams allowed obtaining an accuracy of 92.71% in Twitter texts. In case of Spanish, Maier and Gómez-Rodríguez (2014) combined language models with n -grams allowed reaching accuracies in the range of 60%–70% in variety identification among Argentinian, Chilean, Colombian, Mexican and Spanish also on Twitter texts.…”
Section: Related Workmentioning
confidence: 99%