In this paper we advocate an optimization-centric view on Bayesian statistics and introduce a novel generalization of Bayesian inference. On both counts, our inspiration is the representation of Bayes' rule as an infinite-dimensional optimization problem as shown independently by Csiszár (1975);Donsker and Varadhan (1975);Zellner (1988). First, we use this representation to prove a surprising optimality result of standard Variational Inference (VI) methods: Under the proposed view, the standard Evidence Lower Bound (ELBO) maximizing VI posterior is always preferable to alternative approximations of the Bayesian posterior. Next, we argue for an optimization-centric generalization of standard Bayesian inference. The need for this generalization arises in situations of severe misalignment between reality and three assumptions underlying the standard Bayesian posterior: (1) Well-specified priors, (2) well-specified likelihood models and (3) the availability of infinite computing power. In response to this observation, our generalization is defined by three arguments and named the Rule of Three (RoT). Each of its three arguments relaxes one of the assumptions underlying standard Bayesian inference. We axiomatically derive the RoT and recover existing methods as special cases, including the Bayesian posterior and its approximation by standard Variational Inference (VI). In contrast, alternative approximations to the Bayesian posterior maximizing other ELBO-like objectives violate these axioms. Finally, we introduce a special case of the RoT that we call Generalized Variational Inference (GVI).GVI posteriors are a large and tractable family of belief distributions specified by three arguments: A loss, a divergence and a variational family. GVI posteriors possess appealing theoretical properties, including consistency and an interpretation as an approximate ELBO.