MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible

Boito, Marcely Zanon; Havard, William N.; Garnerin, Mahault; Ferrand, Éric Le; Besacier, Laurent

doi:10.48550/arxiv.1907.12895

Cited by 1 publication

(2 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As the dataset has been a popular and useful resource, it has been further extended with captions in other languages such as Chinese (Li et al, 2016) and Turkish (Unal et al, 2016). However, (Federmann and Lewis, 2017) 4.5-10h audio 7k-18k segments de, en, fr, ja, zh IWSLT '18 (Niehues et al, 2018) 1,565 audio clips 171k segments de, en LibriSpeech (Kocabiyikoglu et al, 2018) 236h audio 131k segments en, fr MuST-C (Di Gangi et al, 2019a) 385-504h audio 211k-280k segments 10 languages MaSS (Boito et al, 2019) 18.5-23h audio 8.2k segments 8 languages as these captions were independently crowdsourced, they are not translations of each other, which makes them less effective for MMT.…”

Section: Flickr8kmentioning

confidence: 99%

See 1 more Smart Citation

Multimodal Machine Translation through Visuals and Speech

Sulubacak¹,

Çağlayan²,

Grönroos³

et al. 2019

Preprint

View full text Add to dashboard Cite

Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data. The most prominent tasks in this area are spoken language translation, image-guided translation, and video-guided translation, which exploit audio and visual modalities, respectively. These tasks are distinguished from their monolingual counterparts of speech recognition, image captioning, and video captioning by the requirement of models to generate outputs in a different language. This survey reviews the major data resources for these tasks, the evaluation campaigns concentrated around them, the state of the art in end-to-end and pipeline approaches, and also the challenges in performance evaluation. The paper concludes with a discussion of directions for future research in these areas: the need for more expansive and challenging datasets, for targeted evaluations of model performance, and for multimodality in both the input and output space.

show abstract

Section: Flickr8kmentioning

confidence: 99%

“…The Multilingual corpus of Sentence-aligned Spoken utterances (MaSS) (Boito et al, 2019) is a multilingual corpus of read bible verses and chapter names from the New Testament. It is fully multi-parallel across 8 languages (Basque, English, Finnish, French, Hungarian, Romanian, Russian, and Spanish), comprising 56 language pairs in total.…”

Section: Massmentioning

confidence: 99%