Traditional document similarity measures provide a coarse-grained distinction between similar and dissimilar documents. Typically, they do not consider in what aspects two documents are similar. This limits the granularity of applications like recommender systems that rely on document similarity. In this paper, we extend similarity with aspect information by performing a pairwise document classification task. We evaluate our aspect-based document similarity approach for research papers. Paper citations indicate the aspect-based similarity, i. e., the title of a section in which a citation occurs acts as a label for the pair of citing and cited paper. We apply a series of Transformer models such as RoBERTa, ELECTRA, XLNet, and BERT variations and compare them to an LSTM baseline. We perform our experiments on two newly constructed datasets of 172,073 research paper pairs from the ACL Anthology and CORD-19 corpus. According to our results, SciBERT is the best performing system with F1-scores of up to 0.83. A qualitative analysis validates our quantitative results and indicates that aspect-based document similarity indeed leads to more fine-grained recommendations.
Summarization is a widespread method for handling very large graphs. The task of structural graph summarization is to compute a concise but meaningful synopsis of the key structural information of a graph. As summaries may be used for many different purposes, there is no single concept or model of graph summaries. We have studied existing structural graph summaries for large-scale (semantic) graphs. Despite their different concepts and purposes, we found commonalities in the graph structures they capture. We use these commonalities to provide for the first time a formally defined common model, FLUID (FLexible graph sUmmarIes for Data graphs), that allows us to flexibly define structural graph summaries. FLUID allows graph summaries to be quickly defined, adapted, and compared for different purposes and datasets. To this end, FLUID provides features of structural summarization based on equivalence relations such as distinction of types and properties, direction of edges, bisimulation, and inference. We conduct a detailed complexity analysis of the features provided by FLUID. We show that graph summaries defined with FLUID can be computed in the worst case in time O(n 2 ) w.r.t. n, the number of edges in the data graph. An empirical analysis of large-scale web graphs with billions of edges indicates a typical running time of Θ(n). Based on the formal FLUID model, one can quickly define and modify various structural graph summaries from the literature and beyond.
Recent advances in the area of legal information systems have led to a variety of applications that promise support in processing and accessing legal documents. Unfortunately, these applications have various limitations, e. g., regarding scope or extensibility. Furthermore, we do not observe a trend towards open access in digital libraries in the legal domain as we observe in other domains, e. g., economics of computer science. To improve open access in the legal domain, we present our approach for an open source platform to transparently process and access Legal Open Data. This enables the sustainable development of legal applications by offering a single technology stack. Moreover, the approach facilitates the development and deployment of new technologies. As proof of concept, we implemented six technologies and generated metadata for more than 250,000 German laws and court decisions. Thus, we can provide users of our platform not only access to legal documents, but also the contained information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.