Detecting In-line Mathematical Expressions in Scientific Documents

Iwatsuki, Kazuhiko; Sagara, Takeshi; Hara, Tatsunori; Aizawa, Akiko

doi:10.1145/3103010.3121041

Cited by 21 publications

(26 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After the word extraction process, word features and conditional random field (CRF) are used for inline expression detection. The achieved accuracy in detection is 88.95% on PDF files from the ACL Anthology dataset [31] but there are still many errors in the detection of variables reported in the research.…”

Section: ) Mathematical Expression Detection In Native Pdf Documentsmentioning

confidence: 95%

“…In recent years, several researches [21], [31], [32] have focused on the detection of mathematical expressions in PDF documents. For PDF documents, metadata information of textual words such as font, size, styles can be extracted precisely.…”

Section: ) Mathematical Expression Detection In Native Pdf Documentsmentioning

confidence: 99%

“…Therefore, the detection of mathematical expressions in PDF documents is more accurate than that of imagebased documents. The method reported in [31] extracts inline expressions in PDF documents with the use of natural language processing. After the word extraction process, word features and conditional random field (CRF) are used for inline expression detection.…”

Section: ) Mathematical Expression Detection In Native Pdf Documentsmentioning

confidence: 99%

See 2 more Smart Citations

A Hybrid Method for Mathematical Expression Detection in Scientific Document Images

Phong

Hoang

2020

IEEE Access

View full text Add to dashboard Cite

Mathematical expressions have been widely used in scientific documents. In order to analyze the documents, automatic detection of mathematical expressions is a crucial step. The paper presents a unified system for the detection of mathematical expressions including both inline and isolated expressions in scientific document images that usually consist of heterogeneous components (e.g., figures, tables, text and expressions). In the system, a hybrid method of two stages is proposed for the effective detection of mathematical expressions. First, the layout analysis of entire document images is introduced to improve the accuracy of text line and word segmentation. Then, both isolated and inline expressions in document images are detected. Both hand-crafted and deep learning features are extensively investigated and combined to improve the detection accuracy. Furthermore, a generic performance metric is applied to evaluate the system comprehensively. The proposed method has been evaluated on two public benchmark datasets (Marmot and GTDB). The obtained accuracies of isolated and inline expressions in the Marmot dataset are 91.18% and 81.35% while those in the GTDB dataset are 89.51% and 80.20%, respectively. The performance comparison is carried out with the conventional methods to show the outstanding effectiveness of the proposed system. Moreover, extensive experiments have been performed in order to point out the effect of document image resolution and post processing techniques on mathematical expression detection. INDEX TERMS Mathematical expression detection, document analysis, machine learning, neural network, fusion technique.

show abstract

Section: ) Mathematical Expression Detection In Native Pdf Documentsmentioning

confidence: 95%

Section: ) Mathematical Expression Detection In Native Pdf Documentsmentioning

confidence: 99%

Section: ) Mathematical Expression Detection In Native Pdf Documentsmentioning

confidence: 99%

See 1 more Smart Citation

A Hybrid Method for Mathematical Expression Detection in Scientific Document Images

Phong

Hoang

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Document image processing is an interesting topic among the computer vision research community. Significant progress has been made in this domain, including heuristic-based, convolutional neural network (CNN) based, statistics-based-like conditional random fields (CRFs) and graph trees, and\or a combination of these methods [7,8,[14][15][16]. Heuristics include color-based features, shape-based features, geometric features, and keypoint descriptors.…”

Section: Related Workmentioning

confidence: 99%

“…Iwatsuki et al [14] presented a CRF based method to extract formulas and mathematical zones from PDF documents. Their method uses layout features like font, style, and linguistic features such as n-gram context to build their CRF model.…”

Section: Related Workmentioning

confidence: 99%

Fi-Fo Detector: Figure and Formula Detection Using Deformable Networks

et al. 2020

View full text Add to dashboard Cite

We propose a novel hybrid approach that fuses traditional computer vision techniques with deep learning models to detect figures and formulas from document images. The proposed approach first fuses the different computer vision based image representations, i.e., color transform, connected component analysis, and distance transform, termed as Fi-Fo image representation. The Fi-Fo image representation is then fed to deep models for further refined representation-learning for detecting figures and formulas from document images. The proposed approach is evaluated on a publicly available ICDAR-2017 Page Object Detection (POD) dataset and its corrected version. It produces the state-of-the-art results for formula and figure detection in document images with an f1-score of 0.954 and 0.922, respectively. Ablation study results reveal that the Fi-Fo image representation helps in achieving superior performance in comparison to raw image representation. Results also establish that the hybrid approach helps deep models to learn more discriminating and refined features.

show abstract

Mathematical Variable Detection in PDF Scientific Documents

Phong

Hoang

et al. 2019

Intelligent Information and Database Systems

View full text Add to dashboard Cite

Detecting In-line Mathematical Expressions in Scientific Documents

Cited by 21 publications

References 4 publications

A Hybrid Method for Mathematical Expression Detection in Scientific Document Images

A Hybrid Method for Mathematical Expression Detection in Scientific Document Images

Fi-Fo Detector: Figure and Formula Detection Using Deformable Networks

Mathematical Variable Detection in PDF Scientific Documents

Contact Info

Product

Resources

About