Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training

Wang, Zelun; Liu, Jyh-Charn

doi:10.1007/s10032-020-00360-2

Cited by 61 publications

(25 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…al. [41] for implementing the encoding to more than one dimensions. The positional encoding for a one dimensional query space is given by…”

Section: Positional Encoding Of Output Query Locationsmentioning

confidence: 99%

Learning Operators with Coupled Attention

Kissas¹,

Seidman²,

Guilhoto³

et al. 2022

Preprint

View full text Add to dashboard Cite

Supervised operator learning is an emerging machine learning paradigm with applications to modeling the evolution of spatio-temporal dynamical systems and approximating general black-box relationships between functional data. We propose a novel operator learning method, LOCA (Learning Operators with Coupled Attention), motivated from the recent success of the attention mechanism. In our architecture, the input functions are mapped to a finite set of features which are then averaged with attention weights that depend on the output query locations. By coupling these attention weights together with an integral transform, LOCA is able to explicitly learn correlations in the target output functions, enabling us to approximate nonlinear operators even when the number of output function in the training set measurements is very small. Our formulation is accompanied by rigorous approximation theoretic guarantees on the universal expressiveness of the proposed model. Empirically, we evaluate the performance of LOCA on several operator learning scenarios involving systems governed by ordinary and partial differential equations, as well as a black-box climate prediction problem. Through these scenarios we demonstrate state of the art accuracy, robustness with respect to noisy input data, and a consistently small spread of errors over testing data sets, even for out-of-distribution prediction tasks.

show abstract

“…al. [41] for implementing the encoding to more than one dimensions. The positional encoding for a one dimensional query space is given by…”

Section: Positional Encoding Of Output Query Locationsmentioning

confidence: 99%

Learning Operators with Coupled Attention

Kissas¹,

Seidman²,

Guilhoto³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…However, as they are translation-invariant by nature, convolutional layers might have difficulties in understanding the global structure in a frame, and recent work [41] has shown that the positional encodings help neural networks to learn position-aware representations. Hence, we add sinusoidal positional encodings in 2D as in [42] to the embedding of the frame obtained by residual layers, before passing it to the convolutional LSTM units.…”

Section: Methodsmentioning

confidence: 99%

Self-Supervision by Prediction for Object Discovery in Videos

Besbinar¹,

Frossard²

2021

Preprint

View full text Add to dashboard Cite

Despite their irresistible success, deep learning algorithms still heavily rely on annotated data. On the other hand, unsupervised settings pose many challenges, especially about determining the right inductive bias in diverse scenarios. One scalable solution is to make the model generate the supervision for itself by leveraging some part of the input data, which is known as self-supervised learning. In this paper, we use the prediction task as self-supervision and build a novel object-centric model for image sequence representation. In addition to disentangling the notion of objects and the motion dynamics, our compositional structure explicitly handles occlusion and inpaints inferred objects and background for the composition of the predicted frame. With the aid of auxiliary loss functions that promote spatially and temporally consistent object representations, our self-supervised framework can be trained without the help of any manual annotation or pretrained network. Initial experiments confirm that the proposed pipeline is a promising step towards object-centric video prediction.

show abstract

“…Following work by Wang et al . Wang and Liu (2021), we used the extended version of 2D positional encoding for 3D.…”

Section: Conclusion and Limitationsmentioning

confidence: 99%

A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation

Peiris¹,

Hayat²,

Chen³

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper presents a Transformer architecture for volumetric medical image segmentation. Designing a computationally efficient Transformer architecture for volumetric segmentation is a challenging task. It requires keeping a complex balance in encoding local and global spatial cues, and preserving information along all axes of the volumetric data. The proposed volumetric Transformer has a U-shaped encoder-decoder design that processes the input voxels in their entirety. Our encoder has two consecutive self-attention layers to simultaneously encode local and global cues, and our decoder has novel parallel shifted window based self and cross attention blocks to capture fine details for boundary refinement by subsuming Fourier position encoding. Our proposed design choices result in a computationally efficient architecture, which demonstrates promising results on Brain Tumor Segmentation (BraTS) 2021, and Medical Segmentation Decathlon (Pancreas and Liver) datasets for tumor segmentation. We further show that the representations learned by our model transfer better across-datasets and are robust against data corruptions. Our code implementation is publicly available.

show abstract

Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training

Cited by 61 publications

References 37 publications

Learning Operators with Coupled Attention

Learning Operators with Coupled Attention

Self-Supervision by Prediction for Object Discovery in Videos

A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation

Contact Info

Product

Resources

About