With the rapid rise of the cloud computing paradigm, the manual maintenance and provisioning of the technological layers behind it, both in their hardware and virtualized form, became cumbersome and errorprone. This has opened up the need for automated capacity planning strategies in heterogeneous cloud computing environments. However, even with mechanisms to fully accommodate customers and fulfill servicelevel agreements, providers often tend to over-provision their hardware and virtual resources. A proliferation of unused capacity leads to higher energy costs, and correspondingly, the price for cloud technology services. Capacity planning algorithms rely on data collected from the utilized resources. Yet, the amount of data aggregated through the monitoring of hardware and virtual instances does not allow for a manual supervision, much less data analysis or a correlation and anomaly detection. Current data science advancements enable the assistance of efficient automation, scheduling and provisioning of cloud computing resources based on supervised and unsupervised machine learning techniques. In this work, we present the current state of the art in monitoring, storage, analysis and adaptation approaches for the data produced by cloud computing environments, to enable proactive, dynamic resource provisioning.