Norms are central to how online communities are governed. Yet, norms are also emergent, arise from interaction, and can vary significantly between communities-making them challenging to study at scale. In this paper, we study community norms on Reddit in a large-scale, empirical manner. Via 2.8M comments removed by moderators of 100 top subreddits over 10 months, we use both computational and qualitative methods to identify three types of norms: macro norms that are universal to most parts of Reddit; meso norms that are shared across certain groups of subreddits; and micro norms that are specific to individual, relatively unique subreddits. Given the size of Reddit's user base-and the wide range of topics covered by different subreddits-we argue this represents the first large-scale study of norms across disparate online communities. In other words, these findings shed light on what Reddit values, and how widely-held those values are. We conclude by discussing implications for the design of new and existing online communities.
Thousands of new news articles appear daily in outlets in different languages. Understanding which articles refer to the same story can not only improve applications like news aggregation but enable cross-linguistic analysis of media consumption and attention. However, assessing the similarity of stories in news articles is challenging due to the different dimensions in which a story might vary, e.g., two articles may have substantial textual overlap but describe similar events that happened years apart. To address this challenge, we introduce a new dataset of nearly 10,000 news article pairs spanning 18 language combinations annotated for seven dimensions of similarity as SemEval 2022 Task 8. Here, we present an overview of the task, the best performing submissions, and the frontiers and challenges for measuring multilingual news article similarity. While the participants of this SemEval task contributed very strong models, achieving up to 0.818 correlation with gold standard labels across languages, human annotators are capable of reaching higher correlations, suggesting space for further progress.
Conspiracy theories are omnipresent in online discussions---whether to explain a late-breaking event that still lacks official report or to give voice to political dissent. Conspiracy theories evolve, multiply, and interconnect, further complicating efforts to understand them and to limit their propagation. It is therefore crucial to develop scalable methods to examine the nature of conspiratorial discussions in online communities. What do users talk about when they discuss conspiracy theories online? What are the recurring elements in their discussions? What do these elements tell us about the way users think? This work answers these questions by analyzing over ten years of discussions in r/conspiracy---an online community on Reddit dedicated to conspiratorial discussions. We focus on the key elements of a conspiracy theory: the conspiratorial agents, the actions they perform, and their targets. By computationally detecting agent?action?target triplets in conspiratorial statements, and grouping them into semantically coherent clusters, we develop a notion of narrative-motif to detect recurring patterns of triplets. For example, a narrative-motif such as "governmental agency-controls-communications" represents the various ways in which multiple conspiratorial statements denote how governmental agencies control information. Thus, narrative-motifs expose commonalities between multiple conspiracy theories even when they refer to different events or circumstances. In the process, these representations help us understand how users talk about conspiracy theories and offer us a means to interpret what they talk about. Our approach enables a population-scale study of conspiracy theories in alternative news and social media with implications for understanding their adoption and combating their spread.
Research has focused on automated methods to effectively detect sexism online. Although overt sexism seems easy to spot, its subtle forms and manifold expressions are not. In this paper, we outline the different dimensions of sexism by grounding them in their implementation in psychological scales. From the scales, we derive a codebook for sexism in social media, which we use to annotate existing and novel datasets, surfacing their limitations in breadth and validity with respect to the construct of sexism. Next, we leverage the annotated datasets to generate adversarial examples, and test the reliability of sexism detection methods. Results indicate that current machine learning models pick up on a very narrow set of linguistic markers of sexism and do not generalize well to out-of-domain examples. Yet, including diverse data and adversarial examples at training time results in models that generalize better and that are more robust to artifacts of data collection. By providing a scale-based codebook and insights regarding the shortcomings of the state-of-the-art, we hope to contribute to the development of better and broader models for sexism detection, including reflections on theory-driven approaches to data collection.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.