Weakly supervised spoken term discovery using cross-lingual side information

Bansal, Sameer; Kamper, Herman; Goldwater, Sharon; Lopez, Adam

doi:10.1109/icassp.2017.7953260

Cited by 10 publications

(8 citation statements)

References 21 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Poor cross-speaker matches and low audio coverage prevent our system from achieving a high recall, suggesting the of use speech features that are effective in multi-speaker settings (Kamper et al, 2015;Kamper et al, 2016a) and speaker normalization (Zeghidour et al, 2016). Finally, Bansal et al (2017) recently showed that UTD can be improved using the translations themselves as a source of information, which suggests joint learning as an attractive area for future work.…”

Section: Discussionmentioning

confidence: 98%

See 1 more Smart Citation

Towards speech-to-text translation without speech recognition

Bansal¹,

Kamper²,

Lopez³

et al. 2017

Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 2

Self Cite

View full text Add to dashboard Cite

We explore the problem of translating speech to text in low-resource scenarios where neither automatic speech recognition (ASR) nor machine translation (MT) are available, but we have training data in the form of audio paired with text translations. We present the first system for this problem applied to a realistic multi-speaker dataset, the CALLHOME Spanish-English speech translation corpus. Our approach uses unsupervised term discovery (UTD) to cluster repeated patterns in the audio, creating a pseudotext, which we pair with translations to create a parallel text and train a simple bag-of-words MT model. We identify the challenges faced by the system, finding that the difficulty of cross-speaker UTD results in low recall, but that our system is still able to correctly translate some content words in test data.

show abstract

Section: Discussionmentioning

confidence: 98%

“…Many of these errors are due to cross-speaker matches, which are known to be more challenging for UTD (Carlin et al, 2011;Kamper et al, 2015;Bansal et al, 2017). Most matches in our corpus are across calls, yet these are also the least accurate (Table 1).…”

Section: Assigning Wrong Words To a Clustermentioning

confidence: 91%

Towards speech-to-text translation without speech recognition

Bansal¹,

Kamper²,

Lopez³

et al. 2017

Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 2

Self Cite

View full text Add to dashboard Cite

show abstract

“…For endangered languages (extremely low-resource settings) the lack of training data leads to the problem being framed as a sparse translation problem. This semi-supervised task lies between speech translation and keyword spotting, with cross-lingual supervision being used for word segmentation [30,31,32,33]. Bilingual setups for word segmentation were discussed by [34,35,36,37], but applied to speech transcripts (true phones).…”

Section: Related Workmentioning

confidence: 99%

A Small Griko-Italian Speech Translation Corpus

Boito¹,

Anastasopoulos²,

Villavicencio³

et al. 2018

6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018)

View full text Add to dashboard Cite

This paper presents an extension to a very low-resource parallel corpus collected in an endangered language, Griko, making it useful for computational research. The corpus consists of 330 utterances (about 20 minutes of speech) which have been transcribed and translated in Italian, with annotations for word-level speech-to-transcription and speech-to-translation alignments. The corpus also includes morphosyntactic tags and word-level glosses. Applying an automatic unit discovery method, pseudo-phones were also generated. We detail how the corpus was collected, cleaned and processed, and we illustrate its use on zero-resource tasks by presenting some baseline results for the task of speech-to-translation alignment and unsupervised word discovery. The dataset is available online, aiming to encourage replicability and diversity in computational language documentation experiments.

show abstract

“…We extend s2t to identify new instances of those prototypes in the unlabeled speech, using a modified version of ZRTools, the same UTD toolkit used by UTD-align. 3 (Jansen et al, 2010) Previous work has indicated that using translation text to inform acoustic clustering provides more accurate clusters than just using UTD (Bansal et al, 2017a), so we initially expected that this straightforward extension of s2t would work better than UTD-align. However, early experiments indicated that the text had too much influence on clustering, yielding clusters with highly diverse audio, and thus poor prototypes.…”

Section: Methodsmentioning

confidence: 99%

Spoken Term Discovery for Language Documentation using Translations

Anastasopoulos¹,

Bansal²,

Chiang³

et al. 2017

Proceedings of the Workshop on Speech-Centric Natural Language Processing

Self Cite

View full text Add to dashboard Cite

Vast amounts of speech data collected for language documentation and research remain untranscribed and unsearchable, but often a small amount of speech may have text translations available. We present a method for partially labeling additional speech with translations in this scenario. We modify an unsupervised speech-totranslation alignment model and obtain prototype speech segments that match the translation words, which are in turn used to discover terms in the unlabelled data. We evaluate our method on a SpanishEnglish speech translation corpus and on two corpora of endangered languages, Arapaho and Ainu, demonstrating its appropriateness and applicability in an actual very-low-resource scenario.

show abstract

Weakly supervised spoken term discovery using cross-lingual side information

Cited by 10 publications

References 21 publications

Towards speech-to-text translation without speech recognition

Towards speech-to-text translation without speech recognition

A Small Griko-Italian Speech Translation Corpus

Spoken Term Discovery for Language Documentation using Translations

Contact Info

Product

Resources

About