Reinforcement learning (RL) is often considered a promising approach for controlling complex building operations. In this context, RL algorithms are typically evaluated using a testing framework that simulates building operations. To make general claims and avoid overfitting, an RL method should be evaluated on a large and diverse set of buildings. Unfortunately, due to the complexity of creating building simulations, none of the existing frameworks provide more than a handful of simulated buildings. Moreover, each framework has its own particularities, which makes it difficult to evaluate the same algorithm on multiple frameworks. To address this, we present Beobench: a Python toolkit 1 that provides unified access to building simulations from multiple frameworks using a container-based approach. We demonstrate the power of our approach with an example showing how Beobench can launch RL experiments in any supported framework with a single command.