An End-to-End Formula Recognition Method Integrated Attention Mechanism

Zhou, Mingle; Cai, Meifeng; Li, Gang; Li, Min

doi:10.3390/math11010177

Cited by 4 publications

(13 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In recent years, these systems have evolved rapidly through deep learning. Usually, transformer-based approaches [2][3][4] have proven to outperform traditional statistical models [15,16] and convolutional neural networks [5,6,[17][18][19]. These neural networks are able to learn and recognize intricate patterns and features within images automatically, making them particularly well-suited for accurately extracting text with subscripts such as mathematical formulas from scanned documents or images [20].…”

Section: Mathematical Expression Recognitionmentioning

confidence: 99%

“…Figure 1 illustrates the two possible processes for transcribing mathematical formulas. In the first approach, the image of the formula is converted into LaTeX code using mathematical expression recognition (MER) [2][3][4][5][6]. This LaTeX code then serves as input for a natural language processing (NLP) model, which produces the transcription of the formula.…”

Section: Introductionmentioning

confidence: 99%

“…Retrieve a complete transcription that covers the whole formula, "sigma is equal to the square root of one divided by n times the sum from i equals one to n of xi minus mu squared"). In the automatic transcription of mathematical formulas, challenge 1 and challenge 2 are usually handled using MER models, which convert the image of the formula into LaTeX code [2][3][4][5][6]. As demonstrated in Figure 3, LaTeX code usually contains commands such as \lim, \sum, and \frac.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Investigating Models for the Transcription of Mathematical Formulas in Images

Feichter,

Schlippe

2024

Applied Sciences

View full text Add to dashboard Cite

The automated transcription of mathematical formulas represents a complex challenge that is of great importance for digital processing and comprehensibility of mathematical content. Consequently, our goal was to analyze state-of-the-art approaches for the transcription of printed mathematical formulas on images into spoken English text. We focused on two approaches: (1) The combination of mathematical expression recognition (MER) models and natural language processing (NLP) models to convert formula images first into LaTeX code and then into text, and (2) the direct conversion of formula images into text using vision-language (VL) models. Since no dataset with printed mathematical formulas and corresponding English transcriptions existed, we created a new dataset, Formula2Text, for fine-tuning and evaluating our systems. Our best system for (1) combines the MER model LaTeX-OCR and the NLP model BART-Base, achieving a translation error rate of 36.14% compared with our reference transcriptions. In the task of converting LaTeX code to text, BART-Base, T5-Base, and FLAN-T5-Base even outperformed ChatGPT, GPT-3.5 Turbo, and GPT-4. For (2), the best VL model, TrOCR, achieves a translation error rate of 42.09%. This demonstrates that VL models, predominantly employed for classical image captioning tasks, possess significant potential for the transcription of mathematical formulas in images.

show abstract

Section: Mathematical Expression Recognitionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Investigating Models for the Transcription of Mathematical Formulas in Images

Feichter,

Schlippe

2024

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…Histogram equalization techniques, such as adaptive histogram equalization (AHE) and contrast-limited adaptive histogram equalization (CLAHE), have been widely used. Recent research has explored integrating deep learning models, such as U-Net and Pix2Pix networks, for adaptive contrast enhancement [25]. These approaches have demonstrated their effectiveness in handling varying illumination conditions and improving OCR performance.…”

Section: Preprocessing Techniques For Image Enhancementmentioning

confidence: 99%

“…Binarization and thresholding techniques are also critical in OCR preprocessing. Binarization converts grayscale or color images into binary representations, separating foreground characters from the background [25]. Various thresholding techniques, including global thresholding, local adaptive thresholding, and hybrid methods, have been proposed to address different image characteristics and challenges.…”

Section: Preprocessing Techniques For Image Enhancementmentioning

confidence: 99%

Advancing OCR Accuracy in Image-to-LaTeX Conversion—A Critical and Creative Exploration

Orji,

Haydar,

Erşan

et al. 2023

Applied Sciences

View full text Add to dashboard Cite

This paper comprehensively assesses the application of active learning strategies to enhance natural language processing-based optical character recognition (OCR) models for image-to-LaTeX conversion. It addresses the existing limitations of OCR models and proposes innovative practices to strengthen their accuracy. Key components of this study include the augmentation of training data with LaTeX syntax constraints, the integration of active learning strategies, and the employment of active learning feedback loops. This paper first examines the current weaknesses of OCR models with a particular focus on symbol recognition, complex equation handling, and noise moderation. These limitations serve as a framework against which the subsequent research methodologies are assessed. Augmenting the training data with LaTeX syntax constraints is a crucial strategy for improving model precision. Incorporating symbol relationships, wherein contextual information is considered during recognition, further enriches the error correction. This paper critically examines the application of active learning strategies. The active learning feedback loop leads to progressive improvements in accuracy. This article underlines the importance of uncertainty and diversity sampling in sample selection, ensuring that the dynamic learning process remains efficient and effective. Appropriate evaluation metrics and ensemble techniques are used to improve the operational learning effectiveness of the OCR model. These techniques allow the model to adapt and perform more effectively in diverse application domains, further extending its utility.

show abstract