Every Layer Counts: Multi-Layer Multi-Head Attention for Neural Machine Translation

Ampomah, Isaac K. E.; McClean, Sally; Lin, Zhiwei; Hawe, Glenn I.

doi:10.14712/00326585.005

Cited by 2 publications

(2 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The suggested RPE MHA NMT model presents a strategy for analyzing the alignment of terms in the resulting translation with terms in the source sentence. This strategy is illustrated in Figure 9 by visualizing the annotation weights [29,30]. Every row of the matrix displays the weights associated with annotations, with the x-axis representing the AD sentence and the y-axis representing the resulting sentence in the MSA.…”

Section: Attention Analysismentioning

confidence: 99%

A Reverse Positional Encoding Multi-Head Attention-Based Neural Machine Translation Model for Arabic Dialects

2022

View full text Add to dashboard Cite

Languages with a grammatical structure that have a free order for words, such as Arabic dialects, are considered a challenge for neural machine translation (NMT) models because of the attached suffixes, affixes, and out-of-vocabulary words. This paper presents a new reverse positional encoding mechanism for a multi-head attention (MHA) neural machine translation (MT) model to translate from right-to-left texts such as Arabic dialects (ADs) to modern standard Arabic (MSA). The proposed model depends on an MHA mechanism that has been suggested recently. The utilization of the new reverse positional encoding (RPE) mechanism and the use of sub-word units as an input to the self-attention layer improve this sublayer for the proposed model’s encoder by capturing all dependencies between the words in right-to-left texts, such as AD input sentences. Experiments were conducted on Maghrebi Arabic to MSA, Levantine Arabic to MSA, Nile Basin Arabic to MSA, Gulf Arabic to MSA, and Iraqi Arabic to MSA. Experimental analysis proved that the proposed reverse positional encoding MHA NMT model was efficiently able to handle the open grammatical structure issue of Arabic dialect sentences, and the proposed RPE MHA NMT model enhanced the translation quality for right-to-left texts such as Arabic dialects.

show abstract

Section: Attention Analysismentioning

confidence: 99%

A Reverse Positional Encoding Multi-Head Attention-Based Neural Machine Translation Model for Arabic Dialects

2022

View full text Add to dashboard Cite

show abstract

“…For some small languages and dialects, there is a shortage of relevant talents. With the help of English machine translation, the translation quality can meet the basic task requirements to make up for the lack of good and bad translators [13][14][15]. When the number of translations is small, the difference between the cost of manual translation and English machine translation is not particularly obvious.…”

Section: Introductionmentioning

confidence: 99%

English Machine Translation Model Based on an Improved Self-Attention Technology

Pan

2021

Scientific Programming

View full text Add to dashboard Cite

English machine translation is a natural language processing research direction that has important scientific research value and practical value in the current artificial intelligence boom. The variability of language, the limited ability to express semantic information, and the lack of parallel corpus resources all limit the usefulness and popularity of English machine translation in practical applications. The self-attention mechanism has received a lot of attention in English machine translation tasks because of its highly parallelizable computing ability, which reduces the model’s training time and allows it to capture the semantic relevance of all words in the context. The efficiency of the self-attention mechanism, however, differs from that of recurrent neural networks because it ignores the position and structure information between context words. The English machine translation model based on the self-attention mechanism uses sine and cosine position coding to represent the absolute position information of words in order to enable the model to use position information between words. This method, on the other hand, can reflect relative distance but does not provide directionality. As a result, a new model of English machine translation is proposed, which is based on the logarithmic position representation method and the self-attention mechanism. This model retains the distance and directional information between words, as well as the efficiency of the self-attention mechanism. Experiments show that the nonstrict phrase extraction method can effectively extract phrase translation pairs from the n-best word alignment results and that the extraction constraint strategy can improve translation quality even further. Nonstrict phrase extraction methods and n-best alignment results can significantly improve the quality of translation translations when compared to traditional phrase extraction methods based on single alignment.

show abstract

Every Layer Counts: Multi-Layer Multi-Head Attention for Neural Machine Translation

Cited by 2 publications

References 20 publications

A Reverse Positional Encoding Multi-Head Attention-Based Neural Machine Translation Model for Arabic Dialects

A Reverse Positional Encoding Multi-Head Attention-Based Neural Machine Translation Model for Arabic Dialects

English Machine Translation Model Based on an Improved Self-Attention Technology

Contact Info

Product

Resources

About