2015
DOI: 10.1007/s10579-015-9303-x
|View full text |Cite
|
Sign up to set email alerts
|

Mandarin–English code-switching speech corpus in South-East Asia: SEAME

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
78
0
2

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 76 publications
(82 citation statements)
references
References 25 publications
0
78
0
2
Order By: Relevance
“…Of course, typological and comparative works exist (Lim & Gisborne, 2009;Sharma, 2009;Kortmann & Szmrecsanyi, 2011), but they too make extensive use of the concept. However, it is conceivable -and desirableto complement these qualitative approaches with quantitative data: the compilation of non-monolingual corpora (Deuchar et al, forthcoming;Lyu et al, 2010) is a first step in the right direction, and may provide us with data that may have the potential to better inform our understanding of how language variation in multilingual settings is best modelled. Szmrecsanyi, 2011) and measures of phonetic similarity (McMahon et al, 2007), but here too, the starting point is, more often than not, a conceptual linguistic system tied to a particular locale, not least because of the sampling and collection methods employed in the corpora they use.…”
Section: Discussionmentioning
confidence: 99%
“…Of course, typological and comparative works exist (Lim & Gisborne, 2009;Sharma, 2009;Kortmann & Szmrecsanyi, 2011), but they too make extensive use of the concept. However, it is conceivable -and desirableto complement these qualitative approaches with quantitative data: the compilation of non-monolingual corpora (Deuchar et al, forthcoming;Lyu et al, 2010) is a first step in the right direction, and may provide us with data that may have the potential to better inform our understanding of how language variation in multilingual settings is best modelled. Szmrecsanyi, 2011) and measures of phonetic similarity (McMahon et al, 2007), but here too, the starting point is, more often than not, a conceptual linguistic system tied to a particular locale, not least because of the sampling and collection methods employed in the corpora they use.…”
Section: Discussionmentioning
confidence: 99%
“…The South East Asia Mandarin-English (SEAME) corpus [17] was used for the following experiments. It can be simply separated into two parts by its literal language.…”
Section: Corpusmentioning
confidence: 99%
“…Even in their data, the percentage of code-switched tweets was barely over a tenth of the total test data. There have been other corpora built, particularly for other language pairs such as Mandarin-English (Li et al, 2012;Lyu et al, 2010), but the amount of data available and the percentage of code-switching data within that data are not up to the standards of other areas of the natural language processing field. With this in mind, we sought to provide corpora for multiple language pairs, each with a better distribution of code-switching phenomena.…”
Section: Related Workmentioning
confidence: 99%