This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary literature from computer science, linguistics, and social sciences.
We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of transition probabilities. We prove that such approximate corrections are sufficient for off-policy convergence both in policy evaluation and control, provided certain conditions. These conditions relate the distance between the target and behavior policies, the eligibility trace parameter and the discount factor, and formalize an underlying tradeoff in off-policy TD(λ). We illustrate this theoretical relationship empirically on a continuous-state control task.
We propose a general approach to fairness based on transporting distributions corresponding to different sensitive attributes to a common distribution. We use optimal transport theory to derive target distributions and methods that allow us to achieve fairness with minimal changes to the unfair model. Our approach is applicable to both classification and regression problems, can enforce different notions of fairness, and enable us to achieve a Pareto-optimal trade-off between accuracy and fairness. We demonstrate that it outperforms previous approaches in several benchmark fairness datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.