This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary literature from computer science, linguistics, and social sciences.
We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of transition probabilities. We prove that such approximate corrections are sufficient for off-policy convergence both in policy evaluation and control, provided certain conditions. These conditions relate the distance between the target and behavior policies, the eligibility trace parameter and the discount factor, and formalize an underlying tradeoff in off-policy TD(λ). We illustrate this theoretical relationship empirically on a continuous-state control task.
We propose a general approach to fairness based on transporting distributions corresponding to different sensitive attributes to a common distribution. We use optimal transport theory to derive target distributions and methods that allow us to achieve fairness with minimal changes to the unfair model. Our approach is applicable to both classification and regression problems, can enforce different notions of fairness, and enable us to achieve a Pareto-optimal trade-off between accuracy and fairness. We demonstrate that it outperforms previous approaches in several benchmark fairness datasets.
Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards. In particular, this requires separating skill from luck, ie. disentangling the effect of an action on rewards from that of external factors and subsequent actions. To achieve this, we adapt the notion of counterfactuals from causality theory to a model-free RL setup. The key idea is to condition value functions on future events, by learning to extract relevant information from a trajectory. We then propose to use these as future-conditional baselines and critics in policy gradient algorithms and we develop a valid, practical variant with provably lower variance, while achieving unbiasedness by constraining the hindsight information not to contain information about the agent's actions. We demonstrate the efficacy and validity of our algorithm on a number of illustrative problems.
Abstract:Features associated with an object or its surfaces in natural scenes tend to vary coherently in space and time. In psychological literature, these coherent covariations have been considered to be important for neural systems to acquire models of objects and object categories. From a statistical inference perspective, such coherent covariation can provide a mechanism to learn the statistical priors in natural scenes that are useful for probabilistic inference. In this article, we present some neurophysiological experimental observations in the early visual cortex that provide insights on how correlation structures in visual scenes are being encoded by neuronal tuning and connections among neurons. The key insight is that correlated structures in visual scenes result in correlated neuronal activities, which shapes the tuning properties of individual neurons and the connections between them, embedding Gestalt-related computational constraints or priors for surface inference. Extending these concepts to the inferotemporal cortex suggests a representational framework that is distinct from traditional feed-forward hierarchy of invariant object representation and recognition. In this framework, lateral connections among view-based neurons, learned from the temporal association of the object views observed over time, can form a linked graph structure with local dependency, akin to a dense aspect graph in computer vision. This web-like graph allows view-invariant object representation to be created using sparse feed-forward connections, while maintaining the explicit representation of the different views. Thus, it can serve as an effective prior model for generating predictions of future incoming views to facilitate object inference.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.