In this paper, we define and apply representational stability analysis (ReStA), an intuitive way of analyzing neural language models. ReStA is a variant of the popular representational similarity analysis (RSA) in cognitive neuroscience. While RSA can be used to compare representations in models, model components, and human brains, ReStA compares instances of the same model, while systematically varying single model parameter.
Abstract• Purpose: Governments create large amounts of data. However, the publication of open data is often cumbersome and there are no standard procedures and processes for opening data. This blocks the easy publication of government data. The purpose of this paper is to derive design principles for improving the open data publishing process of public organizations.• Design/methodology/approach: Action Design Research (ADR) was employed to derive design principles. The literature was used as a foundation, and discussion sessions with civil servants were used to evaluate the usefulness of the principles.• Findings: Barriers preventing easy and low-cost publication of open data were identified and connected to design principles, which can be used to guide the design of an open data publishing process. Five new principles are 1) start thinking about the opening of data at the beginning of the process, 2) develop guidelines, especially about privacy and policy sensitivity of data, 3) provide decision support by integrating insight in the activities of other actors involved in the publishing process, 4) make data publication an integral, well-defined and standardized part of daily procedures and routines, 5) monitor how the published data are reused.• Research limitations/implications: The principles are derived using ADR in a single case. A next step can be to investigate multiple comparative case studies and detail the principles further. We recommend using these principles to develop a reference architecture.• Practical implications: The design principles can be used by public organizations to improve their open data publishing processes. The design principles are derived from practice and discussed with practitioners. The discussions showed that the principles could improve the publication process.
No abstract
Multilingual representations have mostly been evaluated based on their performance on specific tasks. In this article, we look beyond engineering goals and analyze the relations between languages in computational representations. We introduce a methodology for comparing languages based on their organization of semantic concepts. We propose to conduct an adapted version of representational similarity analysis of a selected set of concepts in computational multilingual representations. Using this analysis method, we can reconstruct a phylogenetic tree that closely resembles those assumed by linguistic experts. These results indicate that multilingual distributional representations which are only trained on monolingual text and bilingual dictionaries preserve relations between languages without the need for any etymological information. In addition, we propose a measure to identify semantic drift between language families.We perform experiments on word-based and sentence-based multilingual models and provide both quantitative results and qualitative examples. Analyses of semantic drift in multilingual representations can serve two purposes: they can indicate unwanted characteristics of the computational models and they provide a quantitative means to study linguistic phenomena across languages.
Multilingual representations have mostly been evaluated based on their performance on specific tasks. In this article, we look beyond engineering goals and analyze the relations between languages in computational representations. We introduce a methodology for comparing languages based on their organization of semantic concepts. We propose to conduct an adapted version of representational similarity analysis of a selected set of concepts in computational multilingual representations. Using this analysis method, we can reconstruct a phylogenetic tree that closely resembles those assumed by linguistic experts. These results indicate that multilingual distributional representations which are only trained on monolingual text and bilingual dictionaries preserve relations between languages without the need for any etymological information. In addition, we propose a measure to identify semantic drift between language families. We perform experiments on word-based and sentence-based multilingual models and provide both quantitative results and qualitative examples. Analyses of semantic drift in multilingual representations can serve two purposes: they can indicate unwanted characteristics of the computational models and they provide a quantitative means to study linguistic phenomena across languages. The code is available at https://github.com/beinborn/SemanticDrift.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.