Image-to-Markup Generation via Paired Adversarial Learning

Wu, Jin-Wen; Yin, Fei; Zhang, Yanming; Zhang, Xu-Yao; Liu, Cheng‐Lin

doi:10.1007/978-3-030-10925-7_2

Cited by 42 publications

(27 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our formula detector is based on graph-theoretic methods for determining the position of multi-character formulae, in combination with statistical and context-recognition-based approaches for detecting single-character mathematical symbols inside text [12]. [1,2] CASIA/NLPR PAL Group We entered two systems: PAL, and PAL-v2 which extends our previous work [13]. The attention-based encoder-decoder model in PAL-v2 is trained using official data only.…”

Section: Participating Methodsmentioning

confidence: 99%

“…We augment the training data set using rotations, perspective shift, distortion, and bevel, as well as the decomposition operation introduced by Le et al [8]. This expanded the training data to 330k images, which are then used for Paired Adversarial Learning [13]. An ensemble of 6 models with different initializations produced the PAL-V2 results.…”

Section: Participating Methodsmentioning

confidence: 99%

See 1 more Smart Citation

ICDAR 2019 CROHME + TFD: Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection

Mahdavi

Zanibbi

Mouchère

et al. 2019

2019 International Conference on Document Analysis and Recognition (ICDAR)

View full text Add to dashboard Cite

We summarize the tasks, protocol, and outcome for the 6th Competition on Recognition of Handwritten Mathematical Expressions (CROHME), which includes a new formula detection in document images task (+ TFD). For CROHME + TFD 2019, participants chose between two tasks for recognizing handwritten formulas from 1) online stroke data, or 2) images generated from the handwritten strokes. To compare L A T E X strings and the labeled directed trees over strokes (label graphs) used in previous CROHMEs, we convert L A T E X and stroke-based label graphs to label graphs defined over symbols (symbol-level label graphs, or symLG). More than thirty (33) participants registered for the competition, with nineteen (19) teams submitting results. The strongest formula recognition results were produced by the USTC-iFLYTEK research team, for both stroke-based (81%) and image-based (77%) input. For the new typeset formula detection task, the Samsung R&D Institute Ukraine (Team 2) obtained a very strong F-score (93%). System performance has improved since the last CROHME-still, the competition results suggest that recognition of handwritten formulae remains a difficult structural pattern recognition task.

show abstract

Section: Participating Methodsmentioning

confidence: 99%

Section: Participating Methodsmentioning

confidence: 99%

ICDAR 2019 CROHME + TFD: Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection

Mahdavi

Zanibbi

Mouchère

et al. 2019

2019 International Conference on Document Analysis and Recognition (ICDAR)

View full text Add to dashboard Cite

show abstract

“…Extending traditional text recognition, some authors transform images of tables (Zhong et al, 2019;Deng et al, 2019) and mathematical formulas (Deng et al, 2017;Wu et al, 2018) into their LaTeX or HTML representations. After applying a convolutional encoder to the input image, they use a forward RNN based decoder to generate tokens in the target language.…”

Section: Structured Language Generationmentioning

confidence: 99%

End-to-End Extraction of Structured Information from Business Documents with Pointer-Generator Networks

Sage¹,

Aussem²,

Églin³

et al. 2020

Proceedings of the Fourth Workshop on Structured Prediction for NLP

View full text Add to dashboard Cite

The predominant approaches for extracting key information from documents resort to classifiers predicting the information type of each word. However, the word level ground truth used for learning is expensive to obtain since it is not naturally produced by the extraction task. In this paper, we discuss a new method for training extraction models directly from the textual value of information. The extracted information of a document is represented as a sequence of tokens in the XML language. We learn to output this representation with a pointer-generator network that alternately copies the document words carrying information and generates the XML tags delimiting the types of information. The ability of our end-to-end method to retrieve structured information is assessed on a large set of business documents. We show that it performs competitively with a standard word classifier without requiring costly word level supervision.

show abstract

“…[12] proposed a coarse-tofine attention to improve efficiency. In addition, [31] introduced a PAL model and employed an adversarial learning strategy during training.…”

Section: Attention Based Encoder-decoder Approaches For Hmermentioning

confidence: 99%

“…The system UPV denotes the best system in all submitted systems to CROHME 2014 competition while the system Wiris denotes the best system in all submitted systems to CROHME 2016 competition (only using official training dataset) and the details can be seen in [47,48]. The details of WYGIWYS and PAL can refer to [12] and [31], respectively. Please note that the results of the end-to-end approaches are not exactly comparable with traditional approaches in the submitted systems to CROHME competitions as the segmentation error is not explicitly considered.…”

Section: Evaluation Of Multi-modal Scan (Q2)mentioning

confidence: 99%

Multi-modal Attention Network for Handwritten Mathematical Expression Recognition

Wang

Zhang

et al. 2019

2019 International Conference on Document Analysis and Recognition (ICDAR)

View full text Add to dashboard Cite

In this paper, we propose a novel stroke constrained attention network (SCAN) which treats stroke as the basic unit for encoder-decoder based online handwritten mathematical expression recognition (HMER). Unlike previous methods which use trace points or image pixels as basic units, SCAN makes full use of stroke-level information for better alignment and representation. The proposed SCAN can be adopted in both single-modal (online or offline) and multi-modal HMER. For single-modal HMER, SCAN first employs a CNN-GRU encoder to extract point-level features from input traces in online mode and employs a CNN encoder to extract pixel-level features from input images in offline mode, then use stroke constrained information to convert them into online and offline stroke-level features. Using stroke-level features can explicitly group points or pixels belonging to the same stroke, therefore reduces the difficulty of symbol segmentation and recognition via the decoder with attention mechanism. For multi-modal HMER, other than fusing multi-modal information in decoder, SCAN can also fuse multi-modal information in encoder by utilizing the stroke based alignments between online and offline modalities. The encoder fusion is a better way for combining multi-modal information as it implements the information interaction one step before the decoder fusion so that the advantages of multiple modalities can be exploited earlier and more adequately when training the encoder-decoder model. Evaluated on a benchmark published by CROHME competition, the proposed SCAN achieves the state-of-the-art performance.

show abstract

Image-to-Markup Generation via Paired Adversarial Learning

Cited by 42 publications

References 17 publications

ICDAR 2019 CROHME + TFD: Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection

ICDAR 2019 CROHME + TFD: Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection

End-to-End Extraction of Structured Information from Business Documents with Pointer-Generator Networks

Multi-modal Attention Network for Handwritten Mathematical Expression Recognition

Contact Info

Product

Resources

About