2006
DOI: 10.1007/11939993_78
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual Spoken Language Corpus Development for Communication Research

Abstract: Multilingual spoken language corpora are indispensable for research on areas of spoken language communication, such as speech-to-speech translation. The speech and natural language processing essential to multilingual spoken language research requires unified structure and annotation, such as tagging. In this study, we describe an experience with multilingual spoken language corpus development at our research institution, focusing in particular on speech recognition and natural language processing for speech t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
34
0

Year Published

2012
2012
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(34 citation statements)
references
References 7 publications
0
34
0
Order By: Relevance
“…Dima et al (2018) presented a small dependency treebank of travel domain sentences in Modern Standard Arabic. The corpus is created by translating the selected 2,000 sentences from the Basic Travelling Expression Corpus (BTEC) presented by Takezawa in Takezawa (2006). Different parallel corpora were proposed for MSA such as those presented by: 1) Ziemski et al (2016) dealing, in addition to MSA, with five other languages: Chinese, English, French, Russian, Spanish.…”
Section: Building Resourcesmentioning
confidence: 99%
“…Dima et al (2018) presented a small dependency treebank of travel domain sentences in Modern Standard Arabic. The corpus is created by translating the selected 2,000 sentences from the Basic Travelling Expression Corpus (BTEC) presented by Takezawa in Takezawa (2006). Different parallel corpora were proposed for MSA such as those presented by: 1) Ziemski et al (2016) dealing, in addition to MSA, with five other languages: Chinese, English, French, Russian, Spanish.…”
Section: Building Resourcesmentioning
confidence: 99%
“…In previous studies, BTEC has been evaluated as the basic dataset for S2ST purpose [1,4]. Similarly, Bengali translated BTEC also needs to be analyzed in view of the following important points:…”
Section: O Utline Of a Nalysismentioning
confidence: 99%
“…One of the main contributions of this work is using the neural MT approach for the Chinese-Spanish language pair. In the last years, there has appeared more and more resources for this language pair available in [Ziemski et al, 2016a] or from TAUS [Takezawa, 2006]. The TAUS corpus is around 2,890,000 sentences, the Bible corpus about 30,000 sentences and the BTEC corpus about 20,000 sentences.…”
Section: Data and Preprocessingmentioning
confidence: 99%