Abstract-Stress is a response to time pressure or negative environmental conditions. If its stimulus iterates or stays for a long time, it affects health conditions. Thus, stress recognition is an important issue. Traditional systems for this purpose are mostly contact-based, i.e., they require a sensor to be in touch with the body which is not always practical. Contact-free monitoring of the stress by a camera [1], [2] can be an alternative. These systems usually utilize only an RGB or a thermal camera to recognize stress. To the best of our knowledge, the only work on fusion of these two modalities for stress recognition is [3] which uses a feature level fusion of the two modalities. The features in [3] are extracted directly from pixel values. In this paper we show that extracting the features from super-pixels, followed by decision level fusion results in a system outperforming [3]. The experimental results on ANUstressDB database show that our system achieves 89% classification accuracy.