Unwritten languages demand attention too! Word discovery with encoder-decoder models

Boito, Marcely Zanon; Bérard, Alexandre; Villavicencio, Aline; Besacier, Laurent

doi:10.1109/asru.2017.8268972

Cited by 15 publications

(23 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Different from these related works and inspired by [9], this paper presented word segmentation from speech, in a bilingual setup and for a real language documentation scenario (Mboshi). The proposed approach first performs AUD to generate pseudophones from speech, and then uses these units in an encoderdecoder NMT for word segmentation.…”

Section: Discussionmentioning

confidence: 99%

“…However, there is no similar constraint for the source symbols, as discussed by [5]. Rather than enforcing additional constraints on the alignments, as in the latter reference, we propose to reverse the architecture and to translate from WRL words into UL symbols, following [9]. This "reverse" architecture notably prevents the attention model from ignoring some UL symbols.…”

Section: Word Segmentations From Attentionmentioning

confidence: 99%

“…Word discovery experiments from text input on Mboshi were reported in [28]. Bilingual setups (cross-lingual supervision) for word segmentation were discussed by [29,30,31,9], but applied to speech transcripts (true phones). Looking at NMT from speech, the research by [11,12] are recent examples of approaches to end-to-end spoken language translation, but us- ing much larger data conditions than ours.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Unsupervised Word Segmentation from Speech with Attention

Godard¹,

Boito²,

Ondel³

et al. 2018

Interspeech 2018

Self Cite

View full text Add to dashboard Cite

We present a first attempt to perform attentional word segmentation directly from the speech signal, with the final goal to automatically identify lexical units in a low-resource, unwritten language (UL). Our methodology assumes a pairing between recordings in the UL with translations in a well-resourced language. It uses Acoustic Unit Discovery (AUD) to convert speech into a sequence of pseudo-phones that is segmented using neural soft-alignments produced by a neural machine translation model. Evaluation uses an actual Bantu UL, Mboshi; comparisons to monolingual and bilingual baselines illustrate the potential of attentional word segmentation for language documentation. Index Terms:computational language documentation, encoder-decoder models, attentional models, unsupervised word segmentation.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Word Segmentations From Attentionmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Unsupervised Word Segmentation from Speech with Attention

Godard¹,

Boito²,

Ondel³

et al. 2018

Interspeech 2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…This semi-supervised task lies between speech translation and keyword spotting, with cross-lingual supervision being used for word segmentation [30,31,32,33]. Bilingual setups for word segmentation were discussed by [34,35,36,37], but applied to speech transcripts (true phones). Among the most relevant to our approach are the works of [24] on speech-to-translation alignment using attentional Neural Machine Translation (NMT) and of [31,32] for language documentation.…”

Section: Related Workmentioning

confidence: 99%

“…• Neural Segmentation (bilingual): the method applied in this paper was presented in [37]. It post-processes a NMT system's soft-alignment probability matrices to generate hard segmentation.…”

Section: Unsupervised Word Discovery Experimentsmentioning

confidence: 99%

A Small Griko-Italian Speech Translation Corpus

Boito¹,

Anastasopoulos²,

Villavicencio³

et al. 2018

6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018)

Self Cite

View full text Add to dashboard Cite

This paper presents an extension to a very low-resource parallel corpus collected in an endangered language, Griko, making it useful for computational research. The corpus consists of 330 utterances (about 20 minutes of speech) which have been transcribed and translated in Italian, with annotations for word-level speech-to-transcription and speech-to-translation alignments. The corpus also includes morphosyntactic tags and word-level glosses. Applying an automatic unit discovery method, pseudo-phones were also generated. We detail how the corpus was collected, cleaned and processed, and we illustrate its use on zero-resource tasks by presenting some baseline results for the task of speech-to-translation alignment and unsupervised word discovery. The dataset is available online, aiming to encourage replicability and diversity in computational language documentation experiments.

show abstract

Investigating alignment interpretability for low-resource NMT

Boito

Villavicencio

Besacier

2020

Machine Translation

Self Cite

View full text Add to dashboard Cite

The attention mechanism for Neural Machine Translation (NMT) added flexibility to neural models, and the possibility to visualize softalignments between source and target representations. While there is much debate about the impact of attention in the translation quality of neural models [25,40,35,32], in this paper we propose a different assessment, investigating soft-alignment interpretability in low-resource scenarios. We experiment with different architectures (RNN [5], 2D-CNN [15], and Transformer [36]), comparing their capacity to produce directly exploitable alignments. For evaluating exploitability, we replicate the Unsupervised Word Segmentation (UWS) task from Godard et al. [21]. There, source words are translated into unsegmented phone sequences. Posterior to training, the resulting soft-alignments are used for producing segmentation over the target side. Our results show that the RNN produces the most exploitable alignments in this scenario. We thus conclude by investigating methods for increasing its UWS scores. We compare the following methodologies: monolingual pre-training, input representation augmentation (hybrid model), and explicit word length optimization during training. We reach the best results by using the hybrid model, which uses an intermediate monolingual-rooted segmentation from a non-parametric Bayesian model [24] to enrich the input representation before training.

show abstract

Unwritten languages demand attention too! Word discovery with encoder-decoder models

Cited by 15 publications

References 21 publications

Unsupervised Word Segmentation from Speech with Attention

Unsupervised Word Segmentation from Speech with Attention

A Small Griko-Italian Speech Translation Corpus

Investigating alignment interpretability for low-resource NMT

Contact Info

Product

Resources

About