Artificial Intelligence and Machine Learning 2022
DOI: 10.5121/csit.2022.121220
|View full text |Cite
|
Sign up to set email alerts
|

Learning Structured Information from Small Datasets of Heterogeneous Unstructured Multipage Invoices

Abstract: We propose an end to end approach using graph construction and semantic representation learning to solve the problem of structured information extraction from heterogeneous, semi-structured, and high noise human readable documents. Our system first converts PDF documents into single connected graphs where we represent each token on the page as a node, with vertices consisting of the inverse euclidean distances between tokens. Token, lines, and individual character nodes are augmented with dense text model vect… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 10 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?