Our study explores the possible uses and effectiveness of network analysis, including Metadata Record Graphs, for evaluating collections of metadata records at scale. We present the results of an experiment applying these methods to records in the University of North Texas (UNT) Digital Library and two sub-collections of different compositions: the UNT Scholarly Works collection, which functions as an institutional repository, and a collection of architectural slide images. The data includes count-and value-based statistics with network metrics for every Dublin Core element in each set. The study finds that network analysis provides useful information that supplements other metrics, for example by identifying records that are completely unconnected to other items through the subject, creator, or other field values. Additionally, network density may help managers identify collections or records that could benefit from enhancement. We also discuss the constraints of these metrics and suggest possible future applications.
Abstract. Evolving user needs and relevance require continuous change and reform. A good digital collection has mechanisms to accommodate the di®ering uses being made of the digital library system. In a metadata management context, change could mean to transform, substitute, or make the content of a metadata record di®erent from what it is or from what it would be if left alone. In light of the evolving compliance requirements, this paper analyses the three most common types of change within metadata records as well as their subcategories and discusses the possible implications of such changes within and beyond the metadata records.
Considering the value of dates in the life cycle of the digital resource, capturing and storing dates metadata in a structured way can have a significant impact on information retrieval. There are a number of format conventions in common use for encoding the date and time values; the Extended Date/Time Format (EDTF) is one of the most expressive. This paper presents results of an exploratory analysis of representation of dates in over 8 million metadata records from one of the largest digital aggregators, Digital Public Library of America (DPLA), and compares it to EDTF specifications. This benchmark study provides empirical data -at both the individual provider level and the group level (content hubs or service hubs) -about the overall level and patterns of application of date metadata in DPLA metadata records in relation to EDTF.
Purpose of this paperThe purpose of this study is to develop and evaluate a workflow for establishing name authority in uncontrolled collections. Design/methodology/approachWe developed a workflow incorporating command-line tools and tested it in our electronic theses and dissertations (ETDs) collection. We narrowed the scope of the study to born-digital ETDs in the collection and to contributor names, including chairs and committee members. FindingsThis workflow can save staff time and allows for flexible implementation depending on staff numbers and skills as well as institutional needs. Originality/valueThis workflow could be used by other institutions with little or no modification, as it does not rely on specialized software or extensive expertise.
Purpose This study furthers metadata quality research by providing complementary network-based metrics and insights to analyze metadata records and identify areas for improvement. Design/methodology/approach Metadata record graphs apply network analysis to metadata field values; this study evaluates the interconnectedness of subjects within each Hub aggregated into the Digital Public Library of America. It also reviews the effects of NACO normalization – simulating revision of values for consistency – and breaking up pre-coordinated subject headings – to simulate applying the Faceted Application of Subject Terminology to Library of Congress Subject Headings. Findings Network statistics complement count- or value-based metrics by providing context related to the number of records a user might actually find starting from one item and moving to others via shared subject values. Additionally, connectivity increases through the normalization of values to correct or adjust for formatting differences or by breaking pre-coordinated subject strings into separate topics. Research limitations/implications This analysis focuses on exact-string matches, which is the lowest-common denominator for searching, although many search engines and digital library indexes may use less stringent matching methods. In terms of practical implications for evaluating or improving subjects in metadata, the normalization components demonstrate where resources may be most effectively allocated for these activities (depending on a collection). Originality/value Although the individual components of this research are not particularly novel, network analysis has not generally been applied to metadata analysis. This research furthers previous studies related to metadata quality analysis of aggregations and digital collections in general.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.