2021
DOI: 10.48550/arxiv.2106.12940
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction

Abstract: Visual Information Extraction (VIE) task aims to extract key information from multifarious document images (e.g., invoices and purchase receipts). Most previous methods treat the VIE task simply as a sequence labeling problem or classification problem, which requires models to carefully identify each kind of semantics by introducing multimodal features, such as font, color, layout. But simply introducing multimodal features couldn't work well when faced with numeric semantic categories or some ambiguous texts.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 20 publications
0
4
0
Order By: Relevance
“…(3) (4) (5) The loss function may be described as Equation 6, which incorporates a TSR loss represented by , a CTC loss denoted by , and a hyper parameter to balance and since the suggested technique uses multi-task architecture, which means that all the parameters should be trained jointly. In our implementation, we employ cross entropy loss, which is defined as Equation 7, where is the prediction and represents the matching ground truth.…”
Section: B Multi-task Table Structure Recognition and Table Cellmentioning
confidence: 99%
See 1 more Smart Citation
“…(3) (4) (5) The loss function may be described as Equation 6, which incorporates a TSR loss represented by , a CTC loss denoted by , and a hyper parameter to balance and since the suggested technique uses multi-task architecture, which means that all the parameters should be trained jointly. In our implementation, we employ cross entropy loss, which is defined as Equation 7, where is the prediction and represents the matching ground truth.…”
Section: B Multi-task Table Structure Recognition and Table Cellmentioning
confidence: 99%
“…It is important to note that the majority of the current research [1,2,3] on CTC problems focuses on spreadsheet tables, which makes problem description considerably simpler because spreadsheets can store more meta-information and their default units are cells. Some research [4], [5] have attempted to extract entities and information from photos, which is similar to how we characterised the CTC issue. However, because they have not given attention to the tables in document images, their problem definitions cannot be solved simultaneously with the TSR problem.…”
Section: Introductionmentioning
confidence: 97%
“…However, these above methods only consider the text feature information, ignoring the auxiliary information such as visual and layout information in documents. Some GNN-based methods [4]- [6] applied multi-modal features of text segments as nodes to build the document graph and adopted the edge relationship of the graph network to inference the information relevance between neighboring nodes. Furthermore, TRIE [18] introduces an end-end network, which adopts a multi-modal context block to bridge the OCR and IE(information extraction) modules.…”
Section: Related Work a Document Key Information Extractionmentioning
confidence: 99%
“…However, these above methods only consider the text feature information, ignoring the auxiliary information such as visual and layout information in documents. GNN-based methods [4]- [6] apply multi-modal features of text fragments as nodes and adopt the edge relationship of the GNN to evaluate the relations between each entity. However, these efforts rely on a defined set of entity categories/labels for each dataset, which prevents the application of the same DKIE model to other datasets.…”
Section: Introductionmentioning
confidence: 99%