2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2021
DOI: 10.1109/jcdl52503.2021.00045
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating BERT's Encoding of Intrinsic Semantic Features of OCR'd Digital Library Collections

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 3 publications
0
1
0
Order By: Relevance
“…These state-of-the-art French pretrained language models have been fine-tuned on various data for text classification tasks, such as tweets classification [19] or clinical notes classification [4]. Finally, studies on automatic speech transcriptions [14] and digitized texts with optical character recognition (OCR) [18] analyzed the impact of noisy inputs on contextualized word embeddings. Indeed, erroneous data, such as ungrammatical sentences found in many exercise statements, may cause a decline in performance due to model resilience.…”
Section: Related Workmentioning
confidence: 99%
“…These state-of-the-art French pretrained language models have been fine-tuned on various data for text classification tasks, such as tweets classification [19] or clinical notes classification [4]. Finally, studies on automatic speech transcriptions [14] and digitized texts with optical character recognition (OCR) [18] analyzed the impact of noisy inputs on contextualized word embeddings. Indeed, erroneous data, such as ungrammatical sentences found in many exercise statements, may cause a decline in performance due to model resilience.…”
Section: Related Workmentioning
confidence: 99%
“…On the other hand, NLP is a branch of artificial intelligence that focuses on the interaction between computers and human language. It empowers computers to understand, interpret, and generate human language in a way that is both meaningful and contextually relevant [7]. In addition to facilitating human-computer interaction, NLP is essential for data analysis because it enables computers to interpret and extract information from massive amounts of unstructured text data.…”
Section: Introductionmentioning
confidence: 99%