Latent Dirichlet allocation (LDA) topic models are increasingly being used in communication research. Yet, questions regarding reliability and validity of the approach have received little attention thus far. In applying LDA to textual data, researchers need to tackle at least four major challenges that affect these criteria: (a) appropriate pre-processing of the text collection; (b) adequate selection of model parameters, including the number of topics to be generated; (c) evaluation of the model's reliability; and (d) the process of validly interpreting the resulting topics. We review the research literature dealing with these questions and propose a methodology that approaches these challenges. Our overall goal is to make LDA topic modeling more accessible to communication researchers and to ensure compliance with disciplinary standards. Consequently, we develop a brief hands-on user guide for applying LDA topic modeling. We demonstrate the value of our approach with empirical data from an ongoing research project.
We present a novel method to visualize multidimensional point clouds. While conventional visualization techniques, like scatterplot matrices or parallel coordinates, have issues with either overplotting of entities or handling many dimensions, we abstract the data using topological methods before presenting it. We assume the input points to be samples of a random variable with a high-dimensional probability distribution which we approximate using kernel density estimates on a suitably reconstructed mesh. From the resulting scalar field we extract the join tree and present it as a topological landscape, a visualization metaphor that utilizes the human capability of understanding natural terrains. In this landscape, dense clusters of points show up as hills. The nesting of hills indicates the nesting of clusters. We augment the landscape with the data points to allow selection and inspection of single points and point sets. We also present optimizations to make our algorithm applicable to large data sets and to allow interactive adaption of our visualization to the kernel window width used in the density estimation.
Purpose
An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR, demonstrates Transkribus, gives examples of use cases, highlights the affect HTR may have on scholarship, and evidences this turning point of the advanced use of digitised heritage content. The paper aims to discuss these issues.
Design/methodology/approach
This paper adopts a case study approach, using the development and delivery of the one openly available HTR platform for manuscript material.
Findings
Transkribus has demonstrated that HTR is now a useable technology that can be employed in conjunction with mass digitisation to generate accurate transcripts of archival material. Use cases are demonstrated, and a cooperative model is suggested as a way to ensure sustainability and scaling of the platform. However, funding and resourcing issues are identified.
Research limitations/implications
The paper presents results from projects: further user studies could be undertaken involving interviews, surveys, etc.
Practical implications
Only HTR provided via Transkribus is covered: however, this is the only publicly available platform for HTR on individual collections of historical documents at time of writing and it represents the current state-of-the-art in this field.
Social implications
The increased access to information contained within historical texts has the potential to be transformational for both institutions and individuals.
Originality/value
This is the first published overview of how HTR is used by a wide archival studies community, reporting and showcasing current application of handwriting technology in the cultural heritage sector.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.