Visual and Textual Deep Feature Fusion for Document Image Classification

Bakkali, Souhail; Ming, Zuheng; Coustaty, Mickaël; Rusiñol, Marçal

doi:10.1109/cvprw50498.2020.00289

Cited by 34 publications

(33 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Subsequently, these are used by other neural network modules or combined with other features, e.g.., textual ones. Algorithms using textual information, like [5] for document classification, use word or paragraph embeddings created by deep learning frameworks like BERT [9]. BERT stands for "Bidirectional Encoder Representations from Transformers" which is a transformerbased model used for NLP tasks.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

ICPR 2020 Competition on Text Block Segmentation on a NewsEye Dataset

Michael

Weidemann

Laasch

et al. 2021

Pattern Recognition. ICPR International Workshops and Challenges

View full text Add to dashboard Cite

We present a competition on text block segmentation within the framework of the International Conference on Pattern Recognition (ICPR) 2020. The main goal of this competition is to automatically analyse the structure of historical newspaper pages with a subsequent evaluation of the participants' algorithms performance. In contrast to many existing segmentation methods, instead of working on pixels, the present study has a focus on clustering baselines/text lines into text blocks. Therefore, we introduce a new measure based on a baseline detection evaluation scheme. But also common pixel-based approaches could participate without restrictions. Working on baseline level addresses directly the application scenario where for a given image the contained text should be extracted in blocks for further investigations. We present the results of three submissions. The experiments have shown that text blocks can be reliably detected both on pages with a simple layout and on pages with a complex layout.

show abstract

Section: Related Workmentioning

confidence: 99%

“…The newspapers made available for this competition comprise the titles "Arbeiter Zeitung", "Illustrierte Kronen Zeitung", "Innsbrucker Nachrichten" and "Neue Freie Presse". The data can be downloaded from the competition website 5 .…”

Section: Datamentioning

confidence: 99%

ICPR 2020 Competition on Text Block Segmentation on a NewsEye Dataset

Michael

Weidemann

Laasch

et al. 2021

Pattern Recognition. ICPR International Workshops and Challenges

View full text Add to dashboard Cite

show abstract

“…В [4] предложен подход, основанный на выделении, анализе и объединении текстового и визуального потоков для классификации изображений документов, в визуальном потоке используются глубокие CNN для извлечения структурных особенностей изображений, точность зависит от вида входных данных. В исследовании [5] предлагается двухпоточная нейронная архитектура для выполнения задачи классификации изображений документов, при этом используется подход совместного обучения признаков, объединяющий признаки изображения и текстовые части, подход совместного обучения имеет точность классификации до 97,05 %. Преимуществом использования нейросетевого подхода является отказ от шаблонов.…”

Section: Abstract: Document Management Automation Intelligent Document Management Document Classification Convolutional Neural Network Imunclassified

Classification of Scanned Documents Using a Convolutional Neural Network

Kotyuzhanskiy¹,

Chetverkin²,

Protasevich³

et al. 2021

СНТ (MHT)

View full text Add to dashboard Cite

В настоящее время одной из актуальных задач автоматизации документооборота организации в условиях поступления разнообразной документации от большого количества контрагентов является проверка и классификация сканированных материалов. В статье представлен анализ и основные характеристики существующих способов решения данной задачи. Целью исследования является разработка программного модуля, позволяющего классифицировать документы с точностью не менее 97 % в режиме реального времени, что актуально для электронного документооборота в крупных и средних компаниях. Приведено описание решения поставленной задачи на основе сверточной нейросети (CNN -Convolutional Neural Network). Входными данными для программного модуля является pdf-файл сканированного документа, выходными данными является xml-файл с классом документа. Для повышения точности и скорости работы программы были решены задачи по кодированию сигнала для нейронной сети и определению ее структуры. Приведено описание этапов обработки сканированных документов и архитектуры разработанной нейросети. Предложенный метод классификации позволяет классифицировать страницы с высокой точностью на небольшом датасете. Проведено тестирование программы на датасете из 9628 страниц и 22 возможных классов. Точность составила 99,1 %. Время классификации одной страницы без учета чтения файла и копирования в GPU составляет 2 мс на GeForce 780TI. Полное время классификации страницы составляет примерно 22,3 мс.Ключевые слова: автоматизация документооборота, интеллектуальный документооборот, классификация документов, сверточная нейросеть, распознавание изображений

show abstract

“…The classification of documents into different known classes help to improve the overall performance of document processing systems [1]. Consequently, many approaches are proposed for document classification that uses either text content [3][4][5] or document structure [6][7][8][9] to categorize documents into different classes or use both of the modalities [10][11][12][13]. There has been much advancement in this area, especially using deep learning methods [6,14,15].…”

Section: Introductionmentioning

confidence: 99%

“…Therefore, these document images convey high-level structural information with their features, but the low-level features that can disambiguate visually similar images remain uninvestigated for a long time. Various papers investigate the possibility of involving additional features to improve the accuracy like [10], [11] and [13]. These papers obtained state-of-the-art results.…”

Section: Introductionmentioning

confidence: 99%

EmmDocClassifier: Efficient Multimodal Document Image Classifier for Scarce Data

Kanchi¹,

Pagani²,

Mokayed³

et al. 2022

Preprint

View full text Add to dashboard Cite

Document classification is one of the most critical steps in the document analysis pipeline. There are two types of approaches for document classification, known as image-based and multimodal approaches. The image-based document classification approaches are solely based on the inherent visual cues of the document images. In contrast, the multimodal approach co-learns the visual and textual features, and it has proved to be more effective. Nonetheless, these approaches require a huge amount of data. This paper presents a novel approach for document classification that works with a small amount of data and outperforms other approaches. The proposed approach incorporates a hierarchical attention network(HAN) for the textual stream and the EfficientNet-B0 for the image stream. The hierarchical attention network in the textual stream uses the dynamic word embedding through fine-tuned BERT. HAN incorporates both the word level and sentence level features. While the earlier approaches rely on training on a large corpus (RVL-CDIP), we show that our approach works with a small amount of data (Tobacco-3482). To this end, we trained the neural network at Tobacco-3428 from scratch. Thereby, we outperform state-of-the-art by obtaining an accuracy of 90.3%. This results in a relative error reduction rate of 7.9%.

show abstract

Visual and Textual Deep Feature Fusion for Document Image Classification

Cited by 34 publications

References 18 publications

ICPR 2020 Competition on Text Block Segmentation on a NewsEye Dataset

ICPR 2020 Competition on Text Block Segmentation on a NewsEye Dataset

Classification of Scanned Documents Using a Convolutional Neural Network

EmmDocClassifier: Efficient Multimodal Document Image Classifier for Scarce Data

Contact Info

Product

Resources

About