2021
DOI: 10.1007/s10579-021-09552-6
|View full text |Cite
|
Sign up to set email alerts
|

Labelling the past: data set creation and multi-label classification of Dutch archaeological excavation reports

Abstract: The extraction of information from Dutch archaeological grey literature has recently been investigated by the AGNES project. AGNES aims to disclose relevant information by means of a web search engine, to enable researchers to search through excavation reports. In this paper, we focus on the multi-labelling of archaeological excavation reports with time periods and site types, and provide a manually labelled reference set to this end. We propose a series of approaches, pre-processing methods, and various modif… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 16 publications
0
6
0
Order By: Relevance
“…In the past, for newly unearthed archaeological excavation sites, archaeologists generally used bamboo poles to build fences to "enclose" them. Nowadays, these "rough" methods have long been replaced by more complex, efficient, and secure operations [3]. At the site of archaeological excavations, the working platform acts like a hanging basket, placing the archaeologists in protective clothing into the pit to hang, and changing the position, direction, and angle at any time to minimize the possibility of contaminating artifacts and the possibility of filling pits with excavators.…”
Section: Introductionmentioning
confidence: 99%
“…In the past, for newly unearthed archaeological excavation sites, archaeologists generally used bamboo poles to build fences to "enclose" them. Nowadays, these "rough" methods have long been replaced by more complex, efficient, and secure operations [3]. At the site of archaeological excavations, the working platform acts like a hanging basket, placing the archaeologists in protective clothing into the pit to hang, and changing the position, direction, and angle at any time to minimize the possibility of contaminating artifacts and the possibility of filling pits with excavators.…”
Section: Introductionmentioning
confidence: 99%
“…The three Dutch domain-generic models are BERT-NL [ 17 ], BERTje [ 15 ], and RobBERT [ 18 ]. BERT-NL [ 17 ] is a BERT-based model that was pre-trained exclusively on the SoNaR-500 corpus [ 19 ], a corpus comprising 500 million words from a wide range of domains and genres.…”
Section: Related Workmentioning
confidence: 99%
“…The three Dutch domain-generic models are BERT-NL [ 17 ], BERTje [ 15 ], and RobBERT [ 18 ]. BERT-NL [ 17 ] is a BERT-based model that was pre-trained exclusively on the SoNaR-500 corpus [ 19 ], a corpus comprising 500 million words from a wide range of domains and genres. In contrast, BERTje (also BERT-based) was not only pre-trained on this corpus but also on books, news articles and related documents, and Wikipedia articles (total 12 GB) [ 15 ].…”
Section: Related Workmentioning
confidence: 99%
“…Unlike NER where extraction of entities is the goal, document classification aims to assign one or more labels to a text. But similar to NER, this is often done to create metadata (Brandsen & Koole, 2021). Another approach is to label documents as relevant or irrelevant for a particular research question (Fischer et al, 2021).…”
Section: Document Classificationmentioning
confidence: 99%
“…casing and symbols can be key indicators for entities. Lastly, another option is to simply try all possible combinations of preprocessing steps in a brute force method, and select the best performing combinations (Brandsen & Koole, 2021).…”
Section: Selecting Preprocessing Stepsmentioning
confidence: 99%