We argue that there is a need for substantially more research on the use of generative data models in the validation and evaluation of visualization techniques. For example, user studies will require the display of representative and unconfounded visual stimuli, while algorithms will need functional coverage and assessable benchmarks. However, data is often collected in a semi-automatic fashion or entirely hand-picked, which obscures the view of generality, impairs availability, and potentially violates privacy. There are some sub-domains of visualization that use synthetic data in the sense of generative data models, whereas others work with real-world-based data sets and simulations. Depending on the visualization domain, many generative data models are "side projects" as part of an ad-hoc validation of a techniques paper and thus neither reusable nor general-purpose. We review existing work on popular data collections and generative data models in visualization to discuss the opportunities and consequences for technique validation, evaluation, and experiment design. We distill handling and future directions, and discuss how we can engineer generative data models and how visualization research could benefit from more and better use of generative data models.
Language change is often assessed against a set of predetermined time periods in order to be able to trace its diachronic trajectory. This is problematic, since a predetermined periodization might obscure significant developments and lead to false assumptions about the data. Moreover, these time periods can be based on factors which are either arbitrary or non-linguistic, e.g., dividing the corpus data into equidistant stages or taking into account language-external events. Addressing this problem, in this paper we present a data-driven approach to periodization: 'DiaHClust'. Di-aHClust is based on iterative hierarchical clustering and offers a multi-layered perspective on change from text-level to broader time periods. We demonstrate the usefulness of Di-aHClust via a case study investigating syntactic change in Icelandic, modelling the syntactic system of the language in terms of vectors of syntactic change.
Historical change typically is the result of complex interactions between several linguistic factors. Identifying the relevant factors and understanding how they interact across the temporal dimension is the core remit of historical linguistics. With respect to corpus work, this entails a separate annotation, extraction and painstaking pair-wise comparison of the relevant bits of information. This paper presents a significant extension of HistoBankVis, a multilayer visualization system which allows a fast and interactive exploration of complex linguistic data. Linguistic factors can be understood as data dimensions which show complex interrelationships. We model these relationships with the Parallel Sets technique. We demonstrate the powerful potential of this technique by applying the system to understanding the interaction of case, grammatical relations and word order in the history of Icelandic.
In this paper we present a case study in which Visual Analytic methods for interactive data exploration are applied to the study of historical linguistics. We discuss why diachronic linguistic data poses special challenges for Visual Analytics and show how these are handled in a collaboratively developed web-based tool: HistoBankVis. HistoBankVis allows an immediate and efficient interaction with underlying diachronic data and we go through an investigation of the interplay between case marking and word order in Icelandic and Old Saxon to illustrate its features. We then discuss challenges posed by the lack of annotation standardization across different corpora as well as the problems we encountered with respect to errors, uncertainty and issues of data provenance. Overall we conclude that the integration of Visual Analytics methodology into the study of language change has an immense potential but that the full realization of its potential will depend on whether issues of data interoperability and annotation standards can be resolved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.