MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction

Tang, Guozhi; Xie, Lele; Jin, Lianwen; Wang, Jiapeng; Chen, Jingdong; Xu, Zhen; Wang, Qianying; Wu, Yaqiang; Li, Hui

doi:10.48550/arxiv.2106.12940

Cited by 2 publications

(4 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(3) (4) (5) The loss function may be described as Equation 6, which incorporates a TSR loss represented by , a CTC loss denoted by , and a hyper parameter to balance and since the suggested technique uses multi-task architecture, which means that all the parameters should be trained jointly. In our implementation, we employ cross entropy loss, which is defined as Equation 7, where is the prediction and represents the matching ground truth.…”

Section: B Multi-task Table Structure Recognition and Table Cellmentioning

confidence: 99%

“…It is important to note that the majority of the current research [1,2,3] on CTC problems focuses on spreadsheet tables, which makes problem description considerably simpler because spreadsheets can store more meta-information and their default units are cells. Some research [4], [5] have attempted to extract entities and information from photos, which is similar to how we characterised the CTC issue. However, because they have not given attention to the tables in document images, their problem definitions cannot be solved simultaneously with the TSR problem.…”

Section: Introductionmentioning

confidence: 97%

See 1 more Smart Citation

Handling Big Tabular Data of Ict Supply Chains: A Multi-Task, Machine-Interpretable Approach

Kohale,

Sharma

2023

AJCT

View full text Add to dashboard Cite

The essential details of ICT devices are frequently distilled into large tabular data sets that are distributed throughout supply chains as a result of the features of Information and Communications Technology (ICT) goods. With the explosion of electronic assets, it is crucial to automatically analyse tabular structures. We develop a Table Structure Recognition (TSR) work and a Table Cell Type Classification (CTC) task to convert the tabular data in electronic documents into a machine-interpretable format and give layout and semantic information for information extraction and interpretation. For the TSR job, complicated table structures are represented using a graph. Table cells are divided into three groups-Header, Attribute, and Data-based on how they work for the CTC job. Then, utilising the text modal and picture modal characteristics, we provide a multi-task model to accomplish the two tasks concurrently. Our test findings demonstrate that, using the ICDAR2013 and UNLV datasets, our suggested strategy can beat cuttingedge approaches.

show abstract

Section: B Multi-task Table Structure Recognition and Table Cellmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 97%

Handling Big Tabular Data of Ict Supply Chains: A Multi-Task, Machine-Interpretable Approach

Kohale,

Sharma

2023

AJCT

View full text Add to dashboard Cite

show abstract

“…However, these above methods only consider the text feature information, ignoring the auxiliary information such as visual and layout information in documents. Some GNN-based methods [4]- [6] applied multi-modal features of text segments as nodes to build the document graph and adopted the edge relationship of the graph network to inference the information relevance between neighboring nodes. Furthermore, TRIE [18] introduces an end-end network, which adopts a multi-modal context block to bridge the OCR and IE(information extraction) modules.…”

Section: Related Work a Document Key Information Extractionmentioning

confidence: 99%

“…However, these above methods only consider the text feature information, ignoring the auxiliary information such as visual and layout information in documents. GNN-based methods [4]- [6] apply multi-modal features of text fragments as nodes and adopt the edge relationship of the GNN to evaluate the relations between each entity. However, these efforts rely on a defined set of entity categories/labels for each dataset, which prevents the application of the same DKIE model to other datasets.…”

Section: Introductionmentioning

confidence: 99%

DCMAI: A Dynamical Cross-Modal Alignment Interaction Framework for Document Key Information Extraction

Deng¹

2022

Preprint

View full text Add to dashboard Cite

<p>Abstract—Document key information extraction (DKIE) is a challenging task that aims to automatically understand documents in their varied formats and layouts (forms, receipts, etc.). Existing pre-trained methods have shown superior performance on multiple DKIE tasks. However, there are three main drawbacks to DKIE. Firstly, these methods do not consider the ambiguities arising from similar text representations before cross-modal interaction. Secondly, they ignore cross-modal feature alignment before cross-modal interaction. Thirdly, self-attention layers in cross-modal interaction suffer from high memory consumption, which hinders joint representation reasoning from all negative samples. To tackle the above limitations, we present a Dynamical Cross-Modal Alignment Interaction framework(DCMAI). Specifically, (1) to disambiguate similar textual representations, a prior knowledge-guided module is formulated to adaptively mine fine-grained visual information to distinguish similar textual representations, which generates a prior visual knowledge-guided text embedding for each token. (2) A crossover alignment loss is proposed to align cross-modal information, which contributes to improving the interaction between visual and text features before cross-modal interaction. (3) To further reasoning joint representation from a cross-modal encoder and effectively mine cross-modal negative samples, we introduce a hierarchical interaction sampling strategy to mine negative samples, and a contrastive loss is applied to optimize joint representation reasoning. We pre-train the DCMAI framework on a public corpus and fine-tune it on several downstream tasks, such as entity extraction, sequence labeling, and document question answering. The proposed DCMAI approach achieves superior performance on various downstream tasks. Code will be open to the public.</p>

show abstract

MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction

Cited by 2 publications

References 20 publications

Handling Big Tabular Data of Ict Supply Chains: A Multi-Task, Machine-Interpretable Approach

Handling Big Tabular Data of Ict Supply Chains: A Multi-Task, Machine-Interpretable Approach

DCMAI: A Dynamical Cross-Modal Alignment Interaction Framework for Document Key Information Extraction

Contact Info

Product

Resources

About