2020
DOI: 10.1007/s10032-020-00360-2
|View full text |Cite
|
Sign up to set email alerts
|

Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training

Abstract: In this paper we propose a deep neural network model with an encoder-decoder architecture that translates images of math formulas into their LaTeX markup sequences. The encoder is a convolutional neural network (CNN) that transforms images into a group of feature maps. To better capture the spatial relationships of math symbols, the feature maps are augmented with 2D positional encoding before being unfolded into a vector. The decoder is a stacked bidirectional long short-term memory (LSTM) model integrated wi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 61 publications
(25 citation statements)
references
References 37 publications
0
25
0
Order By: Relevance
“…al. [41] for implementing the encoding to more than one dimensions. The positional encoding for a one dimensional query space is given by…”
Section: Positional Encoding Of Output Query Locationsmentioning
confidence: 99%
“…al. [41] for implementing the encoding to more than one dimensions. The positional encoding for a one dimensional query space is given by…”
Section: Positional Encoding Of Output Query Locationsmentioning
confidence: 99%
“…However, as they are translation-invariant by nature, convolutional layers might have difficulties in understanding the global structure in a frame, and recent work [41] has shown that the positional encodings help neural networks to learn position-aware representations. Hence, we add sinusoidal positional encodings in 2D as in [42] to the embedding of the frame obtained by residual layers, before passing it to the convolutional LSTM units.…”
Section: Methodsmentioning
confidence: 99%
“…Following work by Wang et al . Wang and Liu (2021), we used the extended version of 2D positional encoding for 3D.…”
Section: Conclusion and Limitationsmentioning
confidence: 99%