The choice of constellations largely affects the performance of communication systems. When designing constellations, both the locations and probability of occurrence of the points can be optimized. These approaches are referred to as geometric and probabilistic shaping, respectively. Usually, the geometry of the constellation is fixed, e.g., quadrature amplitude modulation (QAM) is used. In such cases, the achievable information rate can still be improved by probabilistic shaping. In this work, we show how autoencoders can be leveraged to perform probabilistic shaping of constellations. We devise an information-theoretical description of autoencoders, which allows learning of capacity-achieving symbol distributions and constellations. Recently, machine learning techniques to perform geometric shaping were proposed. However, probabilistic shaping is more challenging as it requires the optimization of discrete distributions. Furthermore, the proposed method enables joint probabilistic and geometric shaping of constellations over any channel model. Simulation results show that the learned constellations achieve information rates very close to capacity on an additive white Gaussian noise (AWGN) channel and outperform existing approaches on both AWGN and fading channels.