Abstract:Presence-absence (0-1) observations are special in that often the absence of evidence is not evidence of absence. Here we develop an independent factor model, which has the unique capability to isolate the former as an independent discrete binary noise factor. This representation then forms the basis of inferring missed presences by means of denoising. This is achieved in a probabilistic formalism, employing independent beta latent source densities and a Bernoulli data likelihood model. Variational approximati… Show more
“…In addition to DDM and MAC, there are two other probabilistic methods that have been developed in different contexts. Binary Independent Component Analysis (BICA) [22] learns binary vectors that can be combined to fit the data. These vectors, representing the roles in our setting, are orthogonal, that is, each permission can be assigned to only one role.…”
Role mining tackles the problem of finding a role-based access control (RBAC) configuration, given an access-control matrix assigning users to access permissions as input. Most role mining approaches work by constructing a large set of candidate roles and use a greedy selection strategy to iteratively pick a small subset such that the differences between the resulting RBAC configuration and the access control matrix are minimized. In this paper, we advocate an alternative approach that recasts role mining as an inference problem rather than a lossy compression problem. Instead of using combinatorial algorithms to minimize the number of roles needed to represent the access-control matrix, we derive probabilistic models to learn the RBAC configuration that most likely underlies the given matrix.Our models are generative in that they reflect the way that permissions are assigned to users in a given RBAC configuration. We additionally model how user-permission assignments that conflict with an RBAC configuration emerge and we investigate the influence of constraints on role hierarchies and on the number of assignments. In experiments with access-control matrices from real-world enterprises, we compare our proposed models with other role mining methods. Our results show that our probabilistic models infer roles that generalize well to new system users for a wide variety of data, while other models' generalization abilities depend on the dataset given.
“…In addition to DDM and MAC, there are two other probabilistic methods that have been developed in different contexts. Binary Independent Component Analysis (BICA) [22] learns binary vectors that can be combined to fit the data. These vectors, representing the roles in our setting, are orthogonal, that is, each permission can be assigned to only one role.…”
Role mining tackles the problem of finding a role-based access control (RBAC) configuration, given an access-control matrix assigning users to access permissions as input. Most role mining approaches work by constructing a large set of candidate roles and use a greedy selection strategy to iteratively pick a small subset such that the differences between the resulting RBAC configuration and the access control matrix are minimized. In this paper, we advocate an alternative approach that recasts role mining as an inference problem rather than a lossy compression problem. Instead of using combinatorial algorithms to minimize the number of roles needed to represent the access-control matrix, we derive probabilistic models to learn the RBAC configuration that most likely underlies the given matrix.Our models are generative in that they reflect the way that permissions are assigned to users in a given RBAC configuration. We additionally model how user-permission assignments that conflict with an RBAC configuration emerge and we investigate the influence of constraints on role hierarchies and on the number of assignments. In experiments with access-control matrices from real-world enterprises, we compare our proposed models with other role mining methods. Our results show that our probabilistic models infer roles that generalize well to new system users for a wide variety of data, while other models' generalization abilities depend on the dataset given.
“…In [3], the problem of factorization and de-noise of binary data due to independent continuous sources is considered. The sources are assumed to be continuous following beta distribution in [0, 1].…”
Section: Related Workmentioning
confidence: 99%
“…A post-process step is applied to quantize the recovered "gray-scale" sources into binary ones. While the mixing model in [3] can find many real world applications, it is not suitable in the case of OR mixtures.…”
Section: Related Workmentioning
confidence: 99%
“…Since BFA always finds factors no larger than the number of attributes, the resulting factors are clearly dependent in this case. Finally, [5] consider the under-presented case of less observations than latent sources with continuous noise, while [3], [7], [10], [11] deal with the over-determined case, where the number of observation variables are much larger. In this work, we consider primarily the under-presented cases that we typically encounter in data networks where the number of sensors are much smaller and the number of signal sources (i.e.…”
Abstract-Independent component analysis (ICA) is a computational method for separating a multivariate signal into subcomponents assuming the mutual statistical independence of the non-Gaussian source signals. The classical Independent Components Analysis (ICA) framework usually assumes linear combinations of independent sources over the field of realvalued numbers R. In this paper, we investigate binary ICA for OR mixtures (bICA), which can find applications in many domains including medical diagnosis, multi-cluster assignment, Internet tomography and network resource management. We prove that bICA is uniquely identifiable under the disjunctive generation model, and propose a deterministic iterative algorithm to determine the distribution of the latent random variables and the mixing matrix. The inverse problem concerning inferring the values of latent variables are also considered along with noisy measurements. We conduct an extensive simulation study to verify the effectiveness of the propose algorithm and present examples of real-world applications where bICA can be applied.
“…Kabán and Bingham (2008) decomposed the means (success probabilities) of Bernoulli-distributed variables into a convex combination of a priori Beta-distributed latent factors, and in this special case derived a lower bound of the likelihood function which can be used in a variational algorithm. Dimension reduction is less frequently obtained using the mean parameterization with constraints on the range of values than using the canonical parameterization.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.