Abstract. Evaluating agent intelligence is a fundamental issue for the understanding, construction and improvement of autonomous agents. New intelligence tests have been recently developed based on an assessment of task complexity using algorithmic information theory. Some early experimental results have shown that these intelligence tests may be able to distinguish between agents of the same kind, but they do not place very different agents, e.g., humans and machines, on a correct scale. It has been suggested that a possible explanation is that these tests do not measure social intelligence. One formal approach to incorporate social environments in an intelligence test is the recent notion of Darwin-Wallace distribution. Inspired by this distribution we present several new test settings considering competition and cooperation, where we evaluate the "social intelligence" of several reinforcement learning algorithms. The results show that evaluating social intelligence raises many issues that need to be addressed in order to devise tests of social intelligence.