Transformers-based information extraction with limited data for domain-specific business documents

Nguyen, Minh-Tien; Le, Dung Tien; Le, Linh

doi:10.1016/j.engappai.2020.104100

Cited by 24 publications

(7 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Liu et al [14] proposed a pattern-based approach to extract disease and drug combination pairs from MEDLINE abstracts. Nguyen et al [15] utilized the NLP model Transformer to extract information from domain-speci c business documents with limited training data. M. Kerroumi et al [16] proposed a multimodal approach, VisualWordGrid, to extract information from documents with rich visual characteristics, such as tables.…”

Section: ) Data Integrationmentioning

confidence: 99%

AI-enabled Legacy Data Integration with Privacy Protection: a Case Study on Regional Cloud Arbitration Court

Song

Jiao

et al. 2023

Preprint

View full text Add to dashboard Cite

This paper reports an interesting case study on the Legacy Data Integration (LDI for short) for a Regional Cloud Arbitration Court. Due to the inconsistent structure and presentation, legacy arbitration cases can hardly integrate into the Cloud Court unless processed manually. In the case study, we aim to build an AI-enabled LDI method to replace the high-cost manual one and protect privacy during the process. Our method employs Optical Character Recognition (OCR), text classification, Named Entity Recognition (NER), and entity relation extraction to transform legacy data into system format. We train AI models to replace the tasks of the Court staff, such as reading and understanding legacy cases, removing privacy information, composing new records of cases to fit the Cloud Court, and inputting them through the system interfaces. With the applications of a Cloud Arbitration Court in Liaoning Provence, China, our intelligent LDI has similar effectiveness but greater efficiency than the manual LDI. Our method saves 90% of the workforce and achieves a 60%-70% information extraction rate of manual work. Our method achieves a comparable filtering effect for privacy while retaining the maximum amount of information. With the continuous development of informationalization and intelligentization in judgment and arbitration, many courts are building the court system using ABC technologies, namely Artificial intelligence, Big data, and Cloud computing. Our method could provide a practical reference when integrating legal data into the system.

show abstract

Section: ) Data Integrationmentioning

confidence: 99%

AI-enabled Legacy Data Integration with Privacy Protection: a Case Study on Regional Cloud Arbitration Court

Song

Jiao

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…With modern advancements in deep learning technology and the increased need for processing large text datasets, researchers have been optimizing the task of automated text segmentation. Common applications of this natural language processing (NLP) task include information retrieval (Oh et al, 2007;Nguyen et al, 2021), topic segmentation (Arnold et al, 2019;Aumiller et al, 2021), and document summarization (Chuang and Yang, 2000). These tasks can take either linear or hierarchical approaches, with the latter taking into account structural representation of topics within documents (Glavaš and Swapna, 2020).…”

Section: Related Workmentioning

confidence: 99%

ANTS: A Framework for Retrieval of Text Segments in Unstructured Documents

Chivers¹,

Jiang²,

Lee³

et al. 2022

Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing

View full text Add to dashboard Cite

Text segmentation and extraction from unstructured documents can provide business researchers with a wealth of new information on firms and their behaviors. However, the most valuable text is often difficult to extract consistently due to substantial variations in how content can appear from document to document. Thus, the most successful way to extract this content has been through costly crowdsourcing and training of manual workers. We propose the Assisted Neural Text Segmentation (ANTS) framework to identify pertinent text in unstructured documents from a small set of labeled examples. ANTS leverages deep learning and transfer learning architectures to empower researchers to identify relevant text with minimal manual coding. Using a real world sample of accounting documents, we identify targeted sections 96% of the time using only 5 training examples.

show abstract

“…Cho et al (2020) describe a method of automatically classifying the types of documents through a neural network for scanned business documents [29], Lee et al, (2018) presents the automatic classification according to the KSIC (Korea Standard Industry Code) [30], and Yun et al, (2018) describes the automatic classification method for business documents for which document classification is not defined [31]. In addition, Tien et al, (2020) describe a method for deriving meaning for unlabeled documents and then performs classification [32].…”

Section: Review Of Advanced Research For Bdamentioning

confidence: 99%

Improvement of Business Productivity by Applying Robotic Process Automation

et al. 2021

View full text Add to dashboard Cite

Digitalization has been bringing about various changes and innovations not only in our daily life but also in our business environment. In the manufacturing industry, robots have been used for automation for a long time, resulting in innovation in terms of the faster operation process and higher product quality. Robotics Process Automation (RPA) can be said to have brought this innovation in the productivity improvement of many industries into the business office. The purpose of this study is to improve business productivity by applying RPA named CoPA. It is based on Domain-Specific Languages (DSLs) and Model-Driven Engineering (MDE) coupled with MS Office. CoPA has been replaced to perform the repetitive patterned tasks (especially document work) done by many people in an office. For the applications of business productivity, CoPA has been implemented to revise five government project proposals requiring quite strict writing standards. The improvement of business productivity obtained by CoPA has been compared to the performance of 10 employees who are familiar with MS Office. The paper explains the method of CoPA coupled with MS Office as well as the agile method of human collaboration. It is clearly shown that CoPA as a business RPA can improve business productivity in terms of time consumption and document quality.

show abstract

Transformers-based information extraction with limited data for domain-specific business documents

Cited by 24 publications

References 8 publications

AI-enabled Legacy Data Integration with Privacy Protection: a Case Study on Regional Cloud Arbitration Court

AI-enabled Legacy Data Integration with Privacy Protection: a Case Study on Regional Cloud Arbitration Court

ANTS: A Framework for Retrieval of Text Segments in Unstructured Documents

Improvement of Business Productivity by Applying Robotic Process Automation

Contact Info

Product

Resources

About