Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects 2014
DOI: 10.3115/v1/w14-5314
|View full text |Cite
|
Sign up to set email alerts
|

Using Maximum Entropy Models to Discriminate between Similar Languages and Varieties

Abstract: DSLRAE is a hierarchical classifier for similar written languages and varieties based on maximum-entropy (maxent) classifiers. In the first level, the text is classified into a language group using a simple token-based maxent classifier. At the second level, a group-specific maxent classifier is applied to classify the text as one of the languages or varieties within the previously identified group. For each group of languages, the classifier uses a different kind and combination of knowledge-poor features: to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 10 publications
0
6
0
Order By: Relevance
“…Logistic Regression (LR) Chen and Maison (2003) used a logistic regression ("LR") model (also commonly referred to as "maximum entropy" within NLP), smoothed with a Gaussian prior. Porta and Sancho (2014) defined LR for character-based features as follows:…”
Section: Entropymentioning
confidence: 99%
“…Logistic Regression (LR) Chen and Maison (2003) used a logistic regression ("LR") model (also commonly referred to as "maximum entropy" within NLP), smoothed with a Gaussian prior. Porta and Sancho (2014) defined LR for character-based features as follows:…”
Section: Entropymentioning
confidence: 99%
“…An important challenge has been the development of methods to measure the distance between very similar languages or variants and for short texts, where more precision is required, such as in Porta and Sancho (2014); Purver (2014) and Goutte, Léger, Malmasi, and Zampieri (2016).…”
Section: Corpus-driven Methodologiesmentioning
confidence: 99%
“…In the four editions of the DSL shared task a variety of computation methods have been tested. This includes Maximum Entropy (Porta and Sancho, 2014), Prediction by Partial Matching (PPM) (Bobicev, 2015), language model perplexity (Gamallo et al, 2017), SVMs (Purver, 2014), Convolution Neural Networks (CNNs) (Belinkov and Glass, 2016), word-based back-off models (Jauhiainen et al, 2015;Jauhiainen et al, 2016), and classifier ensembles , the approach we apply in this paper.…”
Section: Related Workmentioning
confidence: 99%