Latent Part-of-Speech Sequences for Neural Machine Translation

Yang, Xuewen; Liu, Yingru; Xie, Dong; Wang, Xin; Balasubramanian, Niranjan

doi:10.18653/v1/d19-1072

Cited by 12 publications

(15 citation statements)

References 27 publications

(38 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…It could also explain the results for the large-scale WMT data, where only recurrent systems were able to take advantage of the linguistic annotations. This hypothesis is also compatible with the results reported on WMT data by Yang et al (2019), who successfully leveraged TL linguistic annotations in Transformer systems using an ad-hoc architecture.…”

Section: Error Analysissupporting

confidence: 90%

“…The literature mainly contains incomplete evidence. For instance, Yang et al (2019) conclude that TL part-of-speech annotations boost translation quality with an ad-hoc architecture, but Wagner (2017) claims that TL morpho-syntactic description tags degrade translation quality when they are interleaved: it is not clear whether the difference between both results is caused by the type of linguistic annotations or by the approach followed to integrate them. There are also contradictory results, such as those reported by Tamchyna et al (2017), who claim that TL annotations are only useful when they are combined with lemmatisation, and Nadejde et al (2017), who report positive results without lemmatisation.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

Sánchez-Cartagena

Pérez-Ortiz

Sánchez-Martínez

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

This paper studies the effects of word-level linguistic annotations in under-resourced neural machine translation, for which there is incomplete evidence in the literature. The study covers eight language pairs, different training corpus sizes, two architectures and three types of annotation: dummy tags (with no linguistic information at all), part-of-speech tags, and morpho-syntactic description tags, which consist of part of speech and morphological features. These linguistic annotations are interleaved in the input or output streams as a single tag placed before each word. In order to measure the performance under each scenario, we use automatic evaluation metrics and perform automatic error classification. Our experiments show that, in general, source-language annotations are helpful and morpho-syntactic descriptions outperform part of speech for some language pairs. On the contrary, when words are annotated in the target language, part-of-speech tags systematically outperform morpho-syntactic description tags in terms of automatic evaluation metrics, even though the use of morpho-syntactic description tags improves the grammaticality of the output. We provide a detailed analysis of the reasons behind this result.

show abstract

Section: Error Analysissupporting

confidence: 90%

Section: Introductionmentioning

confidence: 99%

Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

Sánchez-Cartagena

Pérez-Ortiz

Sánchez-Martínez

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Categorical information has achieved great success in neural machine translation, such as partof-speech (POS) tag in autoregressive translation (Yang et al, 2019) and syntactic label in nonautoregressive translation (Akoury et al, 2019). Inspired by the broad application of categorical information, we propose to model the implicit categorical information of target words in a nonautoregressive Transformer.…”

Section: Modeling Target Categorical Information By Vector Quantizationmentioning

confidence: 99%

Non-Autoregressive Translation by Learning Target Categorical Codes

Bao¹,

Huang²,

Xiao³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Non-autoregressive Transformer is a promising text generation model. However, current non-autoregressive models still fall behind their autoregressive counterparts in translation quality. We attribute this accuracy gap to the lack of dependency modeling among decoder inputs. In this paper, we propose CNAT, which learns implicitly categorical codes as latent variables into the non-autoregressive decoding. The interaction among these categorical codes remedies the missing dependencies and improves the model capacity. Experiment results show that our model achieves comparable or better performance in machine translation tasks than several strong baselines.

show abstract

“…Niehues and Cho apply multi-task learning where the encoder of the NMT model is trained to produce multiple tasks such as POS tagging and named-entity recognition into NMT models [15]. There are also works that directly model the syntax of the target sentence during decoding [22][23][24].…”

Section: Related Workmentioning

confidence: 99%

Case-Sensitive Neural Machine Translation

Shi

Huang

Jian

et al. 2020

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Even as an important lexical information for Latin languages, word case is often ignored in machine translation. According to observations, the translation performance drops significantly when we introduce case-sensitive evaluation metrics. In this paper, we introduce two types of case-sensitive neural machine translation (NMT) approaches to alleviate the above problems: i) adding case tokens into the decoding sequence, and ii) adopting case prediction to the conventional NMT. Our proposed approaches incorporate case information to the NMT decoder by jointly learning target word generation and word case prediction. We compare our approaches with multiple kinds of baselines including NMT with naive case-restoration methods and analyze the impacts of various setups on our approaches. Experimental results on three typical translation tasks (Zh-En, En-Fr, En-De) show that our proposed methods lead to the improvements up to 2.5, 1.0 and 0.5 in case-sensitive BLEU scores respectively. Further analyses also illustrate the inherent reasons why our approaches lead to different improvements on different translation tasks.

show abstract

Latent Part-of-Speech Sequences for Neural Machine Translation

Cited by 12 publications

References 27 publications

Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

Non-Autoregressive Translation by Learning Target Categorical Codes

Case-Sensitive Neural Machine Translation

Contact Info

Product

Resources

About