We introduce and illustrate a new approach to measuring and mitigating unintended bias in machine learning models. Our definition of unintended bias is parameterized by a test set and a subset of input features. We illustrate how this can be used to evaluate text classifiers using a synthetic test set and a public corpus of comments annotated for toxicity from Wikipedia Talk pages. We also demonstrate how imbalances in training data can lead to unintended bias in the resulting models, and therefore potentially unfair applications. We use a set of common demographic identity terms as the subset of input features on which we measure bias. This technique permits analysis in the common scenario where demographic information on authors and readers is unavailable, so that bias mitigation must focus on the content of the text itself. The mitigation method we introduce is an unsupervised approach based on balancing the training dataset. We demonstrate that this approach reduces the unintended bias without compromising overall model quality.
Unintended bias in Machine Learning can manifest as systemic differences in performance for different demographic groups, potentially compounding existing challenges to fairness in society at large. In this paper, we introduce a suite of threshold-agnostic metrics that provide a nuanced view of this unintended bias, by considering the various ways that a classifier's score distribution can vary across designated groups. We also introduce a large new test set of online comments with crowd-sourced annotations for identity references. We use this to show how our metrics can be used to find new and potentially subtle unintended bias in existing public models. CCS CONCEPTS• Computing methodologies → Supervised learning by classification; Model verification and validation. ACM Reference Format:
The progress of science is punctuated by the advent of revolutionary technologies that provide new ways and scales to formulate scientific questions and advance knowledge. Following on from electron microscopy, cell culture and PCR, next-generation sequencing is one of these methodologies that is now changing the way that we understand viruses, particularly in the areas of genome sequencing, evolution, ecology, discovery and transcriptomics. Possibilities for these methodologies are only limited by our scientific imagination and, to some extent, by their cost, which has restricted their use to relatively small numbers of samples. Challenges remain, including the storage and analysis of the large amounts of data generated. As the chemistries employed mature, costs will decrease. In addition, improved methods for analysis will become available, opening yet further applications in virology including routine diagnostic work on individuals, and new understanding of the interaction between viral and host transcriptomes. An exciting era of viral exploration has begun, and will set us new challenges to understand the role of newly discovered viral diversity in both disease and health.
One of the main challenges online social systems face is the prevalence of antisocial behavior, such as harassment and personal attacks. In this work, we introduce the task of predicting from the very start of a conversation whether it will get out of hand. As opposed to detecting undesirable behavior after the fact, this task aims to enable early, actionable prediction at a time when the conversation might still be salvaged.To this end, we develop a framework for capturing pragmatic devices-such as politeness strategies and rhetorical prompts-used to start a conversation, and analyze their relation to its future trajectory. Applying this framework in a controlled setting, we demonstrate the feasibility of detecting early warning signs of antisocial behavior in online discussions. * Corresponding senior author.
String diagrams are a powerful tool for reasoning about physical processes, logic circuits, tensor networks, and many other compositional structures. The distinguishing feature of these diagrams is that edges need not be connected to vertices at both ends, and these unconnected ends can be interpreted as the inputs and outputs of a diagram. In this paper, we give a concrete construction for string diagrams using a special kind of typed graph called an open-graph. While the category of open-graphs is not itself adhesive, we introduce the notion of a selective adhesive functor, and show that such a functor embeds the category of open-graphs into the ambient adhesive category of typed graphs. Using this functor, the category of open-graphs inherits "enough adhesivity" from the category of typed graphs to perform double-pushout (DPO) graph rewriting. A salient feature of our theory is that it ensures rewrite systems are "type-safe" in the sense that rewriting respects the inputs and outputs. This formalism lets us safely encode the interesting structure of a computational model, such as evaluation dynamics, with succinct, explicit rewrite rules, while the graphical representation absorbs many of the tedious details. Although topological formalisms exist for string diagrams, our construction is discreet, finitary, and enjoys decidable algorithms for composition and rewriting. We also show how open-graphs can be parametrised by graphical signatures, similar to the monoidal signatures of Joyal and Street, which define types for vertices in the diagrammatic language and constraints on how they can be connected. Using typed open-graphs, we can construct free symmetric monoidal categories, PROPs, and more general monoidal theories. Thus open-graphs give us a handle for mechanised reasoning in monoidal categories.
The damage personal attacks cause to online discourse motivates many platforms to try to curb the phenomenon. However, understanding the prevalence and impact of personal attacks in online platforms at scale remains surprisingly difficult. The contribution of this paper is to develop and illustrate a method that combines crowdsourcing and machine learning to analyze personal attacks at scale. We show an evaluation method for a classifier in terms of the aggregated number of crowd-workers it can approximate. We apply our methodology to English Wikipedia, generating a corpus of over 100k high quality human-labeled comments and 63M machine-labeled ones from a classifier that is as good as the aggregate of 3 crowd-workers, as measured by the area under the ROC curve and Spearman correlation. Using this corpus of machinelabeled scores, our methodology allows us to explore some of the open questions about the nature of online personal attacks. This reveals that the majority of personal attacks on Wikipedia are not the result of a few malicious users, nor primarily the consequence of allowing anonymous contributions from unregistered users.1
We have developed a program for inductive theory formation, called IsaCoSy, which synthesises conjectures 'bottom-up' from the available constants and free variables. The synthesis process is made tractable by only generating irreducible terms, which are then filtered through counter-example checking and passed to the automatic inductive prover IsaPlanner. The main technical contribution is the presentation of a constraint mechanism for synthesis. As theorems are discovered, this generates additional constraints on the synthesis process.We evaluate IsaCoSy as a tool for automatically generating the background theories one would expect in a mature proof assistant, such as the Isabelle system. The results show that IsaCoSy produces most, and sometimes all, of the theorems in the Isabelle libraries. The number of additional un-interesting theorems are small enough to be easily pruned by hand.
Cytokines stimulate inflammatory defenses against viral infections. In order to evade host defenses, viruses have developed strategies to counteract antiviral cytokines. African swine fever virus (ASFV) is a large, double-stranded DNA virus that infects macrophages. This study demonstrates that ASFV effectively inhibited phorbol myristic acid-induced synthesis of antiviral, proinflammatory cytokines alpha interferon, tumor necrosis factor alpha, and interleukin-8 in infected macrophages as assessed by enzyme-linked immunosorbent assay and reverse transcriptase PCR. In contrast, levels of mRNA and protein for transforming growth factor , an anti-inflammatory cytokine, were increased by ASFV infection, suggesting that ASFV-induced inhibition of cytokine synthesis may be limited to cytokines activated by NFB. An interleukin-8 promoter, containing an NFB enhancer site, driving expression of a luciferase reporter gene was used to show that NFB-dependent transcription was inhibited by the virus and by a cloned ASFV gene, A238L. This gene encodes a protein with homology to IB, the inhibitor of NFB. Electrophoretic mobility shift assay showed that cells expressing the A238L gene inhibited NFB binding to DNA. These results suggest that the A238L gene product interacts with NFB to prevent transcription and downregulate proinflammatory cytokine production. This novel viral evasion strategy encoded in a single IB-like protein may be capable of inhibiting most macrophage NFBdependent antiviral mechanisms and may provide insights into how ASFV causes a fatal hemorrhagic disease of domestic pigs and a persistent infection in the African warthog, which is its natural permissive host.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.