R. Choenni scite author profile

In this paper, we define and apply representational stability analysis (ReStA), an intuitive way of analyzing neural language models. ReStA is a variant of the popular representational similarity analysis (RSA) in cognitive neuroscience. While RSA can be used to compare representations in models, model components, and human brains, ReStA compares instances of the same model, while systematically varying single model parameter.

show abstract

Design principles for improving the process of publishing open data

Janssen

Choenni

Meijer

2014

View full text Add to dashboard Cite

Abstract• Purpose: Governments create large amounts of data. However, the publication of open data is often cumbersome and there are no standard procedures and processes for opening data. This blocks the easy publication of government data. The purpose of this paper is to derive design principles for improving the open data publishing process of public organizations.• Design/methodology/approach: Action Design Research (ADR) was employed to derive design principles. The literature was used as a foundation, and discussion sessions with civil servants were used to evaluate the usefulness of the principles.• Findings: Barriers preventing easy and low-cost publication of open data were identified and connected to design principles, which can be used to guide the design of an open data publishing process. Five new principles are 1) start thinking about the opening of data at the beginning of the process, 2) develop guidelines, especially about privacy and policy sensitivity of data, 3) provide decision support by integrating insight in the activities of other actors involved in the publishing process, 4) make data publication an integral, well-defined and standardized part of daily procedures and routines, 5) monitor how the published data are reused.• Research limitations/implications: The principles are derived using ADR in a single case. A next step can be to investigate multiple comparative case studies and detail the principles further. We recommend using these principles to develop a reference architecture.• Practical implications: The design principles can be used by public organizations to improve their open data publishing processes. The design principles are derived from practice and discussed with practitioners. The discussions showed that the principles could improve the publication process.

show abstract

On the Usability of Big (Social) Data

Choenni

Netten²,

Shoae-Bargh³

et al. 2018

View full text Add to dashboard Cite

Semantic Drift in Multilingual Representations

Beinborn¹,

Choenni²

2020

Computational Linguistics

View full text Add to dashboard Cite

Multilingual representations have mostly been evaluated based on their performance on specific tasks. In this article, we look beyond engineering goals and analyze the relations between languages in computational representations. We introduce a methodology for comparing languages based on their organization of semantic concepts. We propose to conduct an adapted version of representational similarity analysis of a selected set of concepts in computational multilingual representations. Using this analysis method, we can reconstruct a phylogenetic tree that closely resembles those assumed by linguistic experts. These results indicate that multilingual distributional representations which are only trained on monolingual text and bilingual dictionaries preserve relations between languages without the need for any etymological information. In addition, we propose a measure to identify semantic drift between language families.We perform experiments on word-based and sentence-based multilingual models and provide both quantitative results and qualitative examples. Analyses of semantic drift in multilingual representations can serve two purposes: they can indicate unwanted characteristics of the computational models and they provide a quantitative means to study linguistic phenomena across languages.

show abstract

Semantic Drift in Multilingual Representations

Beinborn¹,

Choenni²

2019

Preprint

View full text Add to dashboard Cite

Multilingual representations have mostly been evaluated based on their performance on specific tasks. In this article, we look beyond engineering goals and analyze the relations between languages in computational representations. We introduce a methodology for comparing languages based on their organization of semantic concepts. We propose to conduct an adapted version of representational similarity analysis of a selected set of concepts in computational multilingual representations. Using this analysis method, we can reconstruct a phylogenetic tree that closely resembles those assumed by linguistic experts. These results indicate that multilingual distributional representations which are only trained on monolingual text and bilingual dictionaries preserve relations between languages without the need for any etymological information. In addition, we propose a measure to identify semantic drift between language families. We perform experiments on word-based and sentence-based multilingual models and provide both quantitative results and qualitative examples. Analyses of semantic drift in multilingual representations can serve two purposes: they can indicate unwanted characteristics of the computational models and they provide a quantitative means to study linguistic phenomena across languages. The code is available at https://github.com/beinborn/SemanticDrift.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

R. Choenni

Blackbox Meets Blackbox: Representational Similarity & Stability Analysis of Neural Language Models and Brains

Design principles for improving the process of publishing open data

On the Usability of Big (Social) Data

Semantic Drift in Multilingual Representations

Semantic Drift in Multilingual Representations

Contact Info

Product

Resources

About