Engineering a physical system to feature designated characteristics states an inverse design problem, which is often determined by several discrete and continuous parameters. If such a system must feature a particular behavior, the mentioned combination of both, discrete and continuous, parameters results in a challenging optimization problem that requires an extensive search for an optimal system design. However, if the corresponding inverse design problem can be reformulated as a parameterized Markov decision process, reinforcement learning (RL) provides a heuristic framework to solve it. In this work, we use multi-layer thin films as an example of the aforementioned optimization problems and consider three design parameters: Each of the thin film layer's dielectric material (discrete) and thickness (continuous), as well as the total number of layers (discrete). While recent methods merely determine the optimal thicknesses and-less commonly-the layers' materials, our approach optimizes the total number of stacked layers as well. In summary, we further develop a Q-learning variant to solve inverse design optimization and thereby outperform human experts and current approaches like needle-point optimization or naive RL. For this purpose, we propose an exponentially transformed reward signal that eases policy search and enables constrained optimization. Moreover, the learned Q-values contain information about the optical properties of multi-layer thin films, which allows us a physical interpretation or what-if analysis and thus enables explainability.