Xiameng Qin scite author profile

Xiameng Qin

4Publications

72Citation Statements Received

228Citation Statements Given

How they've been cited

157

How they cite others

108

228

Affiliations

Vision Technology (United States), Beijing Institute of Technology, Baidu (China)

Publications

Order By: Most citations

StrucTexT: Structured Text Understanding with Multi-Modal Transformers

Qian

et al. 2021

View full text Add to dashboard Cite

Structured text understanding on Visually Rich Documents (VRDs) is a crucial part of Document Intelligence. Due to the complexity of content and layout in VRDs, structured text understanding has been a challenging task. Most existing studies decoupled this problem into two sub-tasks: entity labeling and entity linking, which require an entire understanding of the context of documents at both token and segment levels. However, little work has been concerned with the solutions that efficiently extract the structured data from different levels. This paper proposes a unified framework named StrucTexT, which is flexible and effective for handling both sub-tasks. Specifically, based on the transformer, we introduce a segment-token aligned encoder to deal with the entity labeling and entity linking tasks at different levels of granularity. Moreover, we design a novel pre-training strategy with three self-supervised tasks to learn a richer representation. StrucTexT uses the existing Masked Visual Language Modeling task and the new Sentence Length Prediction and Paired Boxes Direction tasks to incorporate the multi-modal information across text, image, and layout. We evaluate our method for structured text understanding at segment-level and token-level and show it outperforms the state-of-the-art counterparts with significantly superior performance on the FUNSD, SROIE, and EPHOIE datasets.

show abstract

Robust Match Fusion Using Optimization

Qin

Shen

Mao

et al. 2015

IEEE Trans. Cybern.

View full text Add to dashboard Cite

In this paper, we present a novel patch-based match and fusion algorithm by taking account of moving scene in a multiple exposure image sequence using optimization. A uniform iterative approach is developed to match and find the corresponding patches in different exposure images, which are then fused in each iteration. Our approach does not need to align the input multiple exposure images before the fusion process. Considering that the pixel values are affected by various exposure time, we design a new patch-based energy function that will be optimized to improve the matching accuracy. An efficient patch-based exposure fusion approach using the random walker algorithm is developed to preserve the moving objects from the input multiple exposure images. To the best of our knowledge, our algorithm is the first patch-based exposure fusion work to preserve the moving objects of dynamic scenes that does not need the registration process of different exposure images. Experimental results of moving scenes demonstrate that our algorithm achieves visually pleasing fusion results without ghosting artifacts, while the results produced by the state-of-the-art exposure fusion and tone mapping algorithms exhibit different levels of ghosting artifacts.

show abstract

EATEN: Entity-Aware Attention for Single Shot Visual Text Extraction

Qin

Liu

et al. 2019

View full text Add to dashboard Cite

Extracting entity from images is a crucial part of many OCR applications, such as entity recognition of cards, invoices, and receipts. Most of the existing works employ classical detection and recognition paradigm. This paper proposes an Entity-aware Attention Text Extraction Network called EATEN, which is an end-to-end trainable system to extract the entities without any post-processing. In the proposed framework, each entity is parsed by its corresponding entity-aware decoder, respectively. Moreover, we innovatively introduce a state transition mechanism which further improves the robustness of entity extraction. In consideration of the absence of public benchmarks, we construct a dataset of almost 0.6 million images in three realworld scenarios (train ticket, passport and business card), which is publicly available at https://github.com/beacandler/EATEN. To the best of our knowledge, EATEN is the first single shot method to extract entities from images. Extensive experiments on these benchmarks demonstrate the state-of-the-art performance of EATEN.

show abstract

Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering

Cao

Qin

Zhao³

et al. 2024

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiameng Qin

StrucTexT: Structured Text Understanding with Multi-Modal Transformers

Robust Match Fusion Using Optimization

EATEN: Entity-Aware Attention for Single Shot Visual Text Extraction

Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering

Contact Info

Product

Resources

About