Individual characteristics in human decision-making are often quantified by fitting a parametric cognitive model to subjects' behavior and then studying differences between them in the associated parameter space. However, these models often fit behavior more poorly than recurrent neural networks (RNNs), which are more flexible and make fewer assumptions about the underlying decision-making processes. Unfortunately, the parameter and latent activity spaces of RNNs are generally highdimensional and uninterpretable, making it hard to use them to study individual differences. Here, we show how to benefit from the flexibility of RNNs while representing individual differences in a low-dimensional and interpretable space. To achieve this, we propose a novel end-to-end learning framework in which an encoder is trained to map the behavior of subjects into a low-dimensional latent space. These low-dimensional representations are used to generate the parameters of individual RNNs corresponding to the decision-making process of each subject. We introduce terms into the loss function that ensure that the latent dimensions are informative and disentangled, i.e., encouraged to have distinct effects on behavior. This allows them to align with separate facets of individual differences. We illustrate the performance of our framework on synthetic data as well as a dataset including the behavior of patients with psychiatric disorders.Preprint. Under review.
We introduce a novel technique for distribution learning based on a notion of sample compression . Any class of distributions that allows such a compression scheme can be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. As an application of this technique, we prove that ˜Θ( kd 2 /ε 2 ) samples are necessary and sufficient for learning a mixture of k Gaussians in R d , up to error ε in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that Õ( kd /ε 2 ) samples suffice, matching a known lower bound. Moreover, these results hold in an agnostic learning (or robust estimation) setting, in which the target distribution is only approximately a mixture of Gaussians. Our main upper bound is proven by showing that the class of Gaussians in R d admits a small compression scheme.
We consider PAC learning of probability distributions (a.k.a. density estimation), where we are given an i.i.d. sample generated from an unknown target distribution, and want to output a distribution that is close to the target in total variation distance. Let F be an arbitrary class of probability distributions, and let F k denote the class of k-mixtures of elements of F. Assuming the existence of a method for learning F with sample complexity m F (ǫ), we provide a method for learning F k with sample complexity O(k log k • m F (ε)/ǫ 2 ). Our mixture learning algorithm has the property that, if the F-learner is proper and agnostic, then the F k -learner would be proper and agnostic as well.This general result enables us to improve the best known sample complexity upper bounds for a variety of important mixture classes. First, we show that the class of mixtures of k axis-aligned Gaussians in R d is PAC-learnable in the agnostic setting with O(kd/ǫ 4 ) samples, which is tight in k and d up to logarithmic factors. Second, we show that the class of mixtures of k Gaussians in R d is PAC-learnable in the agnostic setting with sample complexity O(kd 2 /ǫ 4 ), which improves the previous known bounds of O(k 3 d 2 /ǫ 4 ) and O(k 4 d 4 /ǫ 2 ) in its dependence on k and d. Finally, we show that the class of mixtures of k log-concave distributions over R d is PAClearnable using O(d (d+5)/2 ε −(d+9)/2 k) samples.
We present a fairly general framework for reducing (ε, δ) differentially private (DP) statistical estimation to its non-private counterpart. As the main application of this framework, we give a polynomial time and (ε, δ)-DP algorithm for learning (unrestricted) Gaussian distributions in R d . The sample complexity of our approach for learning the Gaussian up to total variation distance, matching (up to logarithmic factors) the best known informationtheoretic (non-efficient) sample complexity upper bound (Aden-Ali, Ashtiani, Kamath [1]). In an independent work, Kamath, Mouzakis, Singhal, Steinke, and Ullman [22] proved a similar result using a different approach and with O(d 5/2 ) sample complexity dependence on d.As another application of our framework, we provide the first polynomial time (ε, δ)-DP algorithm for robust learning of (unrestricted) Gaussians.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.