2021
DOI: 10.1002/int.22746
|View full text |Cite
|
Sign up to set email alerts
|

The unreasonable effectiveness of machine learning in Moldavian versus Romanian dialect identification

Abstract: Motivated by the seemingly high accuracy levels of machine learning (ML) models in Moldavian versus Romanian dialect identification and the increasing research interest on this topic, we provide a follow-up on the Moldavian versus Romanian Cross-Dialect Topic Identification (MRC) shared task of the VarDial 2019 evaluation campaign. The shared task included two subtask types: one that consisted in discriminating between the Moldavian and Romanian dialects and one that consisted in classifying documents by topic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 135 publications
(251 reference statements)
0
3
0
Order By: Relevance
“…This has led to a growing need for resources on low-resource languages. Considering dialect identification datasets across different languages, we can distinguish between two types of resources: text-based datasets [20,21,22,23,24] and speechbased datasets [10,11,12,13,14,15]. While various languages have benefited from text-based resources that leverage written materials capturing linguistic variations, the auditory dimension of dialects adds an intricate layer of complexity.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…This has led to a growing need for resources on low-resource languages. Considering dialect identification datasets across different languages, we can distinguish between two types of resources: text-based datasets [20,21,22,23,24] and speechbased datasets [10,11,12,13,14,15]. While various languages have benefited from text-based resources that leverage written materials capturing linguistic variations, the auditory dimension of dialects adds an intricate layer of complexity.…”
Section: Introductionmentioning
confidence: 99%
“…To the best of our knowledge, RoDia is the first dataset to tackle spoken dialect identification in the Romanian landscape in accordance with historical, geographical, and sociocultural factors, encouraging the research in this lowresource language. Although there are two text datasets addressing Romanian dialect identification, MOROCO [21] and MOROCO-Tweets [24], these cover only two dialects: Romanian (equivalent to the Muntenesc dialect) and Moldavian (Moldovenesc). In contrast, our dataset is focused on speech and covers all five Romanian dialects, as shown in Figure 1.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation