2021
DOI: 10.1007/s41060-021-00285-x
|View full text |Cite
|
Sign up to set email alerts
|

Anonymization of German financial documents using neural network-based language models with contextual word representations

Abstract: The automatization and digitalization of business processes have led to an increase in the need for efficient information extraction from business documents. However, financial and legal documents are often not utilized effectively by text processing or machine learning systems, partly due to the presence of sensitive information in these documents, which restrict their usage beyond authorized parties and purposes. To overcome this limitation, we develop an anonymization method for German financial and legal d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(8 citation statements)
references
References 18 publications
0
8
0
Order By: Relevance
“…There are efforts to automate the de-identification of German text data using ML methods, from medical and other domains. However, currently, these methods cannot guarantee 100% accuracy [ 10 , 12 ]. Consequently, efficient development of NLP models for structuring radiological reports on-site is of great interest.…”
Section: Introductionmentioning
confidence: 99%
“…There are efforts to automate the de-identification of German text data using ML methods, from medical and other domains. However, currently, these methods cannot guarantee 100% accuracy [ 10 , 12 ]. Consequently, efficient development of NLP models for structuring radiological reports on-site is of great interest.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, the memorization effect in LMs has been further exploited in the federated learning setting (Konečnỳ et al, 2016), where in combination with the information leakage from model updates (Melis et al, 2019;Huang et al, 2020), the attacker is capable of recovering private text in federated learning (Gupta et al, 2022). To mitigate privacy risks, there is a growing interest in making language models privacypreserving (Yu et al, 2022;Shi et al, 2022b;Yue et al, 2023;Cummings et al, 2023) by training them with a differential privacy guarantee (Dwork et al, 2006b;Abadi et al, 2016) or with various anonymization approaches (Nakamura et al, 2020;Biesner et al).…”
Section: Privacy Risks In Language Modelsmentioning
confidence: 99%
“…IRE extract formulas from the text of numerical description and then these formulas are used for NCC. Biesner et al [1] developed a framework based on state-of-the-art deep learning techniques to anonymize sensitive information in financial documents in the German language so that the documents can be further used in other applications without any restriction. As compared to the approaches that do consistency checking, this paper's approach automates the consistency checking task using different transformer-based tabular models.…”
Section: Related Workmentioning
confidence: 99%