A Span Extraction Approach for Information Extraction on Visually-Rich Documents

Nguyen, Tuan-Anh; Vu, Hieu Minh; Sơn, Nguyễn Hồng; Nguyen, Minh-Tien

doi:10.48550/arxiv.2106.00978

Cited by 1 publication

(2 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Majumder et al 2020 present a field-value pairing framework that learns the representations of fields and value candidates in the same feature space using metric learning. Nguyen et al 2021 propose a span extraction approach to extract the start and end of a value for each queried field. Gao et al 2021 introduce a field extraction system that can be trained with large-scale unlabeled documents.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Value Retrieval with Arbitrary Queries for Form-like Documents

Gao¹,

Li²,

Ramaiah³

et al. 2021

Preprint

View full text Add to dashboard Cite

We propose value retrieval with arbitrary queries for form-like documents to reduce human effort of processing forms. Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of layout and semantics of a form. To further boost model performance, we propose a simple document language modeling (sim-pleDLM) strategy to improve document understanding on large-scale model pre-training. Experimental results show that our method outperforms our baselines significantly and the simpleDLM further improves our performance on value retrieval by around 17% F1 score compared with the state-of-the-art pre-training method. Code will be made publicly available. * Mingfei and Le contributed equally.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Our baseline. We implement our baseline following (Majumder et al, 2020;Nguyen et al, 2021). Unlike our method that utilizes a unified transformer to deeply model interactions among the query words and the OCR words, our baseline models the interactions in a shallower way (see Section A for details).…”

Section: Experimental Settingsmentioning

confidence: 99%