2023
DOI: 10.1002/asi.24765
|View full text |Cite
|
Sign up to set email alerts
|

The “Collections as ML Data” checklist for machine learning and cultural heritage

Abstract: Within cultural heritage, there has been a growing and concerted effort to consider a critical sociotechnical lens when applying machine learning techniques to digital collections. Though the cultural heritage community has collectively developed an emerging body of work detailing responsible operations for machine learning in galleries, museums, archives, and libraries at the organizational level, there remains a paucity of guidelines created for researchers embarking on machine learning projects with digital… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 39 publications
0
3
0
Order By: Relevance
“…In addition, existing studies are less likely to use machine learning methods for causal inference examination of VCC in the cultural heritage domain (Zhang et al, 2022). Some scholars believe that there is great potential for exploring digital collections in the cultural heritage domain by applying machine learning methods (Candela, 2023;Fiorucci et al, 2020), and even proposed the "Collections as ML Data" checklist for machine learning and cultural heritage (Lee, 2023). Given this, massive datasets (e.g., open data, open Linked Data, etc.…”
Section: Methodological Agendamentioning
confidence: 99%
“…In addition, existing studies are less likely to use machine learning methods for causal inference examination of VCC in the cultural heritage domain (Zhang et al, 2022). Some scholars believe that there is great potential for exploring digital collections in the cultural heritage domain by applying machine learning methods (Candela, 2023;Fiorucci et al, 2020), and even proposed the "Collections as ML Data" checklist for machine learning and cultural heritage (Lee, 2023). Given this, massive datasets (e.g., open data, open Linked Data, etc.…”
Section: Methodological Agendamentioning
confidence: 99%
“…For example, historical newspapers made available to the public in Chronicling America have been used to train a deep learning model to extract several classes of visual content, including headlines, photographs, illustrations, maps, comics, editorial cartoons, and advertisements (Lee et al, 2020). A ML checklist has been recently published that provides guidelines and best practices to develop a ML project based on CH data (Lee, 2022). An entity linking approach addressed to multilingual newspapers has also been recently proposed that can be applied to a broad range of text categories (Labusch & Neudecker, 2020).…”
Section: Data Quality Assessment and Reuse Of Ch Datasetsmentioning
confidence: 99%
“…Here, a checklist publication workflow was proposed including aspects such as source data management, reproducible data transformation, version control, data documentation and publication (Reyserhove et al 2020). Other initiatives include a checklist for developing a machine learning project based on cultural heritage data (Lee 2022) or a checklist for a Data Management Plan (Digital Curation Centre 2013). Regarding Collections as data at GLAM institutions, previous work has proposed a methodology to select datasets for computationally driven research applied to Spanish text corpora in order to encourage Spanish and Latin American institutions to publish machine-actionable collections (Candela et al 2021).…”
Section: A Checklist To Publish Collections As Data In Glam Institutionsmentioning
confidence: 99%