Study of community in time-varying graphs has been limited to its detection and identification across time. However, presence of time provides us with the opportunity to analyze the interaction patterns of the communities, understand how each individual community grows/shrinks, becomes important over time. This paper, for the first time, systematically studies the temporal interaction patterns of communities using a large scale citation network (directed and unweighted) of computer science. Each individual community in a citation network is naturally defined by a research field -i.e., acting as ground-truth -and their interactions through citations in real time can unfold the landscape of dynamic research trends in the computer science domain over the last fifty years. These interactions are quantified in terms of a metric called inwardness that captures the effect of local citations to express the degree of authoritativeness of a community (research field) at a particular time instance. Several arguments to unveil the reasons behind the temporal changes of inwardness of different communities are put forward using exhaustive statistical analysis. The measurements (importance of field) are compared with the project funding statistics of NSF and it is found that the two are in sync. We believe that this measurement study with a large real-world data is an important initial step towards understanding the dynamics of cluster-interactions in a temporal environment. Note that this paper, for the first time, systematically outlines a new avenue of research that one can practice post community detection.
We introduce 'POLAR' -a framework that adds interpretability to pre-trained word embeddings via the adoption of semantic differentials. Semantic differentials are a psychometric construct for measuring the semantics of a word by analysing its position on a scale between two polar opposites (e.g., cold -hot, soft -hard). The core idea of our approach is to transform existing, pre-trained word embeddings via semantic differentials to a new "polar" space with interpretable dimensions defined by such polar opposites. Our framework also allows for selecting the most discriminative dimensions from a set of polar dimensions provided by an oracle, i.e., an external source. We demonstrate the effectiveness of our framework by deploying it to various downstream tasks, in which our interpretable word embeddings achieve a performance that is comparable to the original word embeddings. We also show that the interpretable dimensions selected by our framework align with human judgement. Together, these results demonstrate that interpretability can be added to word embeddings without compromising performance. Our work is relevant for researchers and engineers interested in interpreting pre-trained word embeddings. CCS CONCEPTS• Computing methodologies → Machine learning approaches. KEYWORDS word embeddings, neural networks, interpretable, semantic differential ACM Reference Format:
Wikipedia can easily be justified as a behemoth, considering the sheer volume of content that is added or removed every minute to its several projects. This creates an immense scope, in the field of natural language processing toward developing automated tools for content moderation and review. In this paper we propose Self Attentive Revision Encoder (StRE) which leverages orthographic similarity of lexical units toward predicting the quality of new edits. In contrast to existing propositions which primarily employ features like page reputation, editor activity or rule based heuristics, we utilize the textual content of the edits which, we believe contains superior signatures of their quality. More specifically, we deploy deep encoders to generate representations of the edits from its text content, which we then leverage to infer quality. We further contribute a novel dataset containing ∼ 21M revisions across 32K Wikipedia pages and demonstrate that StRE outperforms existing methods by a significant margin -at least 17% and at most 103%. Our pretrained model achieves such result after retraining on a set as small as 20% of the edits in a wikipage. This, to the best of our knowledge, is also the first attempt towards employing deep language models to the enormous domain of automated content moderation and review in Wikipedia. 1 en.wikipedia.org/Wikipedia:List of policies 2 stats.wikimedia.org/EN/PlotsPngEditHistoryTop.htm
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.