Abstract. So far, there has not been a comparative evaluation of different approaches for text extraction from scholarly figures. In order to fill this gap, we have defined a generic pipeline for text extraction that abstracts from the existing approaches as documented in the literature. In this paper, we use this generic pipeline to systematically evaluate and compare 32 configurations for text extraction over four datasets of scholarly figures of different origin and characteristics. In total, our experiments have been run over more than 400 manually labeled figures. The experimental results show that the approach BS-4OS results in the best F-measure of 0.67 for the Text Location Detection and the best average Levenshtein Distance of 4.71 between the recognized text and the gold standard on all four datasets using the Ocropy OCR engine.
Abstract. The vast amount of scientific literature poses a challenge when one is trying to understand a previously unknown topic. Selecting a representative subset of documents that covers most of the desired content can solve this challenge by presenting the user a small subset of documents. We build on existing research on representative subset extraction and apply it in an information retrieval setting. Our document selection process consists of three steps: computation of the document representations, clustering, and selection of documents. We implement and compare two different document representations, two different clustering algorithms, and three different selection methods using a coverage and a redundancy metric. We execute our 36 experiments on two datasets, with 10 sample queries each, from different domains. The results show that there is no clear favorite and that we need to ask the question whether coverage and redundancy are sufficient for evaluating representative subsets.
Bar charts are widely used to visualize core results of experiments in research papers or display statistics in news, media, and other reports. However, visualizations like bar charts are mostly manually designed, static presentations of data without the option of adaption to a user's needs. But so far, it is unknown whether interactivity improves the understanding of charts. In this work, we compare static with dynamic bar charts, which offer an interactive stacking option. We assess the efficiency, effectiveness, and satisfaction when answering questions regarding the content of a bar chart. An eye-tracker is used to measure the efficiency. We have conducted a between group experiment with 38 participants. While one group had to solve the aggregation tasks using stackable, i. e., interactive bar charts, the other group was limited to static visualizations. Even though new interactive features require familiarization, we found that the stacking feature significantly helps completing the tasks with respect to efficiency, effectiveness, and satisfaction for bar charts of varying complexity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.