Modern social networks frequently encompass multiple distinct types of connectivity information; for instance, explicitly acknowledged friend relationships might complement behavioral measures that link users according to their actions or interests. One way to represent these networks is as multi-layer graphs, where each layer contains a unique set of edges over the same underlying vertices (users). Edges in different layers typically have related but distinct semantics; depending on the application multiple layers might be used to reduce noise through averaging, to perform multifaceted analyses, or a combination of the two. However, it is not obvious how to extend standard graph analysis techniques to the multi-layer setting in a flexible way. In this paper we develop latent variable models and methods for mining multi-layer networks for connectivity patterns based on noisy data.Index Terms-Hypergraphs, mixture graphical models, multigraphs, Pareto optimality. M ULTI-LAYER networks arise naturally when there exists more than one source of connectivity information for a group of users. For instance, in a social networking context there is often knowledge of direct communication links, i.e., relational information. Examples of relational information include the frequency with which users communicate over social media, or whether a user has sent or received emails from another user in a given time period. However, it is also possible to derive behavioral relationships based on user actions or interests. These behavioral relationships are inferred from information that does not directly connect users, such as individual preferences or usage statistics. In this paper we show how to deal with multiple layers of a social network when performing tasks like inference, clustering, and anomaly detection.We propose a generative hierarchical latent-variable model for multi-layer networks, and show how to perform inference on its parameters. Using techniques from Bayesian Model Averaging [1], the layers of the network are conditionally decoupled using a latent selection variable; this makes it possible to Manuscript Fig. 1. Adjacency and Observation Matrices. This graphical model depicts how the latent adjacency matrices can affect the observations matrices. Note that the observation matrices are dependent on all adjacency matrices in general. write the posterior probability of the latent variables given the multi-layer network. The resulting mixture can be viewed as a scalarization of a multi-objective optimization problem [2]-[4]. When the posterior probability functions are convex, the scalarization is both optimal and consistent with the Bayesian principle of model-averaged inference [2], [5].We then step back from the Bayesian setting and discuss how multi-objective optimization can be used to perform MAP estimation of the desired latent variables. Using the concept of Pareto optimality [4], an entire front of solutions is defined; this allows a user to define a preference over optimization functions and tune the algorit...