Pervasive Attention: 2

Elbayad, Maha; Besacier, Laurent; Verbeek, Jakob

doi:10.18653/v1/k18-1010

Cited by 22 publications

(6 citation statements)

References 18 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although researchers have proposed various new NMT architecture, they usually evaluate their models only in terms of the overall translation quality and rarely mention how the translation has changed (Gehring et al, 2017;Kalchbrenner et al, 2016;Vaswani et al, 2017). Only a few studies do the analysis on the translation quality in terms of sentence length (Elbayad et al, 2018;Zhang et al, 2019). The robustness of the recent NMT models on very long sentences remains to be assessed.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

On the Relation between Position Information and Sentence Length in Neural Machine Translation

Neishi¹,

Yoshinaga

2019

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

View full text Add to dashboard Cite

Long sentences have been one of the major challenges in neural machine translation (NMT). Although some approaches such as the attention mechanism have partially remedied the problem, we found that the current standard NMT model, Transformer, has difficulty in translating long sentences compared to the former standard, Recurrent Neural Network (RNN)-based model. One of the key differences of these NMT models is how the model handles position information which is essential to process sequential data. In this study, we focus on the position information type of NMT models, and hypothesize that relative position is better than absolute position. To examine the hypothesis, we propose RNN-Transformer which replaces positional encoding layer of Transformer by RNN, and then compare RNN-based model and four variants of Transformer. Experiments on ASPEC English-to-Japanese and WMT2014 Englishto-German translation tasks demonstrate that relative position helps translating sentences longer than those in the training data. Further experiments on length-controlled training data reveal that absolute position actually causes overfitting to the sentence length.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Only a few studies do the analysis on the translation quality in terms of sentence length (Elbayad et al, 2018;Zhang et al, 2019). The robustness of the recent NMT models on very long sentences remains to be assessed.…”

Section: Related Workmentioning

confidence: 99%

On the Relation between Position Information and Sentence Length in Neural Machine Translation

Neishi¹,

Yoshinaga

2019

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

View full text Add to dashboard Cite

show abstract

“…The use of LSTMs may also be of concern as there are newer methods emerging in the domain (Bai et al, 2018;Elbayad et al, 2018). We proceeded with the use of LSTMs due to their simplicity, performance, and ability to run quickly on a CPU.…”

Section: Limitations and Future Workmentioning

confidence: 99%

Reduced Order Probabilistic Emulation for Physics‐Based Thermosphere Models

Licata

Mehta

2023

Space Weather

View full text Add to dashboard Cite

The geospace environment is volatile and highly driven. Space weather has effects on Earth's magnetosphere that cause a dynamic and enigmatic response in the thermosphere, particularly on the evolution of neutral mass density. Many models exist that use space weather drivers to produce a density response, but these models are typically computationally expensive or inaccurate for certain space weather conditions. In response, this work aims to employ a probabilistic machine learning (ML) method to create an efficient surrogate for the Thermosphere Ionosphere Electrodynamics General Circulation Model (TIE‐GCM), a physics‐based thermosphere model. Our method leverages principal component analysis to reduce the dimensionality of TIE‐GCM and recurrent neural networks to model the dynamic behavior of the thermosphere much quicker than the numerical model. The newly developed reduced order probabilistic emulator (ROPE) uses Long‐Short Term Memory neural networks to perform time‐series forecasting in the reduced state and provide distributions for future density. We show that across the available data, TIE‐GCM ROPE has similar error to previous linear approaches while improving storm‐time modeling. We also conduct a satellite propagation study for the significant November 2003 storm which shows that TIE‐GCM ROPE can capture the position resulting from TIE‐GCM density with <5 km bias. Simultaneously, linear approaches provide point estimates that can result in biases of 7–18 km.

show abstract

“…So, multiple frames must be processed to find the diffusion coefficient. There are various ways of processing multiple frames such as using LSTMs with CNNs [33] (that can be difficult to train and parallelize [34]) and CNNs with 3D convolutions (that have a higher number of parameters as compared to 2D convolutions). These architectural elements could be an acceptable choice, but for the current problem, the input space can be simplified further.…”

Section: Deep Particle Diffusometrymentioning

confidence: 99%

Deep particle diffusometry: convolutional neural networks for particle diffusometry in the presence of flow and thermal gradients

Sardana,

Wereley

2023

Meas. Sci. Technol.

View full text Add to dashboard Cite

Diffusion coefficient measurement is a helpful tool in revealing various properties of a fluid such as viscosity and temperature. However, determining the diffusion coefficient often requires specialized equipment. Particle-based techniques allow the use of conventional cameras to determine flow properties without any specialized measurement devices. However, the performance of existing methods such as single-particle and correlation-based measurements degrade drastically in the presence of real-world scenarios such as flow and thermal gradients. This work introduces a new method of estimating diffusion coefficient in the presence of flow and thermal gradients named deep particle diffusometry (DPD). The technique uses temporally averaged particle images as inputs and uses convolutional neural networks to predict the underlying diffusion coefficient. The results show that a high fit coefficient R 2 value of 0.99 was achieved with no or known fluid flow conditions and an R 2 value of 0.95 was achieved if the fluid had an arbitrary flow. Next, the generalization ability of the network was shown by training the DPD models on no gradient datasets and testing on datasets with a diffusion coefficient gradient. The networks maintained comparably high R 2 values of 0.96. Next, the DPD models were tested against three conventional methods on various simulated datasets, showing their superior performance in situations where an arbitrary flow was present along with diffusion. Finally, the networks were tested on experimental data and the predictions were compared with conventional methods which resulted in R2 values of 0.97 under the no-flow condition. The results show that the proposed method provides performance similar to existing methods on datasets with no flow or with a known flow and can surpass their performance on datasets that have an arbitrary flow.

show abstract

Pervasive Attention: 2

Cited by 22 publications

References 18 publications

On the Relation between Position Information and Sentence Length in Neural Machine Translation

On the Relation between Position Information and Sentence Length in Neural Machine Translation

Reduced Order Probabilistic Emulation for Physics‐Based Thermosphere Models

Deep particle diffusometry: convolutional neural networks for particle diffusometry in the presence of flow and thermal gradients

Contact Info

Product

Resources

About