Emotions play a significant role in people's lives and interactions. However, automatic recognition of human emotions using computer systems is still a challenging task. Many approaches for automatic emotion recognition have been proposed in the last decades and a vast majority of them use only one type of input for identification, i.e. image, text, or audio. This can lead to false results as people can easily hide their emotions. In this paper, we present a study on the correlation (or inter-agreement) of the results obtained by six existing approaches for emotion recognition that process different kinds of inputs. The obtained results show that there is a low agreement between the approaches, even when they use the same type of input, and that more research is needed to determine the possible causes and also to help improve the quality of the existing tools for emotion detection.