VisualWordGrid: Information Extraction From Scanned Documents Using A Multimodal Approach

Kerroumi, Mohamed; Sayem, Othmane; Shabou, Aymen

doi:10.48550/arxiv.2010.02358

Cited by 4 publications

(5 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Chargrid (Katti et al, 2018) uses a convolution-based encoder-decoder network to fuse text information into images by performing one-hot encoding on characters. VisualWord-Grid (Kerroumi et al, 2020) implements Wordgrid (Katti et al, 2018) by replacing character-level text information with word-level word2vec features, and fusing visual information to improve the extraction performance. BERTgrid (Denk & Reisswig, 2019) uses BERT to obtain contextual text representation, which further improves the end-to-end accuracy.…”

Section: Visual Information Extractionmentioning

confidence: 99%

Document AI: Benchmarks, Models and Applications

Cui¹,

Xu²,

Lv³

et al. 2021

Preprint

View full text Add to dashboard Cite

Document AI, or Document Intelligence, is a relatively new research topic that refers to the techniques for automatically reading, understanding, and analyzing business documents. It is an important research direction for natural language processing and computer vision. In recent years, the popularity of deep learning technology has greatly advanced the development of Document AI, such as document layout analysis, visual information extraction, document visual question answering, document image classification, etc. This paper briefly reviews some of the representative models, tasks, and benchmark datasets. Furthermore, we also introduce early-stage heuristic rule-based document analysis, statistical machine learning algorithms, and deep learning approaches especially pre-training methods. Finally, we look into future directions for Document AI research.

show abstract

Section: Visual Information Extractionmentioning

confidence: 99%

Document AI: Benchmarks, Models and Applications

Cui¹,

Xu²,

Lv³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The research study [19] used the RVL-CDIP dataset that includes scanned document images of different categories, including invoices as one of the categories. It has 25,000 images of every category.…”

Section: Related Datasetsmentioning

confidence: 99%

Multi-Layout Invoice Document Dataset (MIDD): A Dataset for Named Entity Recognition

Baviskar

Ahirrao

Kotecha

2021

Data

View full text Add to dashboard Cite

The day-to-day working of an organization produces a massive volume of unstructured data in the form of invoices, legal contracts, mortgage processing forms, and many more. Organizations can utilize the insights concealed in such unstructured documents for their operational benefit. However, analyzing and extracting insights from such numerous and complex unstructured documents is a tedious task. Hence, the research in this area is encouraging the development of novel frameworks and tools that can automate the key information extraction from unstructured documents. However, the availability of standard, best-quality, and annotated unstructured document datasets is a serious challenge for accomplishing the goal of extracting key information from unstructured documents. This work expedites the researcher’s task by providing a high-quality, highly diverse, multi-layout, and annotated invoice documents dataset for extracting key information from unstructured documents. Researchers can use the proposed dataset for layout-independent unstructured invoice document processing and to develop an artificial intelligence (AI)-based tool to identify and extract named entities in the invoice documents. Our dataset includes 630 invoice document PDFs with four different layouts collected from diverse suppliers. As far as we know, our invoice dataset is the only openly available dataset comprising high-quality, highly diverse, multi-layout, and annotated invoice documents.

show abstract

“…Grid based methods [1], [7], [9] exploit the textual and spatial information of a document by building a grid in which pixels are encoded using character-or token-level embeddings. This grid is then fed to a convolutional encoderdecoder network.…”

Section: Introductionmentioning

confidence: 99%

“…[1]), or by passing the image through a separate encoder (e.g. [9]). One-hot character encoding was used in [1], while in [9] static word embeddings were utilized.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Tokengrid: Toward More Efficient Data Extraction From Unstructured Documents

et al. 2022

View full text Add to dashboard Cite

Key information extraction from unstructured documents is a practical problem in many industries. Machine learning models aimed at solving this problem should efficiently utilize textual, visual, and 2D spatial layout information of the document. Grid based approaches achieve this by representing the document as a 2D grid and feeding it to a fully convolutional encoder-decoder network that solves a semantic instance segmentation problem. We propose a new method for the instance detection branch of that network for the task of automatic information extraction from invoices. Our approach reduces this problem to 1D region detection. The proposed network has fewer parameters and a shorter inference times. Additionally, we suggest a new metric for evaluating the results.

show abstract

VisualWordGrid: Information Extraction From Scanned Documents Using A Multimodal Approach

Cited by 4 publications

References 11 publications

Document AI: Benchmarks, Models and Applications

Document AI: Benchmarks, Models and Applications

Multi-Layout Invoice Document Dataset (MIDD): A Dataset for Named Entity Recognition

Tokengrid: Toward More Efficient Data Extraction From Unstructured Documents

Contact Info

Product

Resources

About