Societies are complex. properties of social systems can be explained by the interplay and weaving of individual actions. Rewards are key to understand people's choices and decisions. For instance, individual preferences of where to live may lead to the emergence of social segregation. In this paper, we combine Reinforcement Learning (RL) with Agent Based Modeling (ABM) in order to address the self-organizing dynamics of social segregation and explore the space of possibilities that emerge from considering different types of rewards. Our model promotes the creation of interdependencies and interactions among multiple agents of two different kinds that segregate from each other. For this purpose, agents use Deep Q-Networks to make decisions inspired on the rules of the Schelling Segregation model and rewards for interactions. Despite the segregation reward, our experiments show that spatial integration can be achieved by establishing interdependencies among agents of different kinds. They also reveal that segregated areas are more probable to host older people than diverse areas, which attract younger ones. Through this work, we show that the combination of RL and ABM can create an artificial environment for policy makers to observe potential and existing behaviors associated to rules of interactions and rewards. The recent availability of large datasets collected from various resources, such as digital transactions, location data and government census, is transforming the ways we study and understand social systems 1. Researchers and policy makers are able to observe and model social interactions and dynamics in great detail, including the structure of friendship networks 2 , the behavior of cities 3 , politically polarized societies 4 , or the spread of information on social media 5. These studies show the behaviors present in the data but do not explore the space of possibilities that human dynamics may evolve to. Robust policies should consider mechanisms to respond to every type of events 6 , including those that are very rare 7. Therefore it is crucial to develop simulation environments such that potentially unobserved social dynamics can be assessed empirically. Agent Based Modeling (ABM) is a generative approach to study natural phenomena based on the interaction of individuals 8 in social, physical and biological systems 9. These models show how different types of individual behavior give rise to emergent macroscopic regularities 10,11 with forecasting capabilities 12. Applications to social systems include the emergence of wealth distributions 13 , new political actors 14 , multipolarity in interstate systems 15 , and cultural differentiation 16 , among other applications 9. ABM allows testing core sociological theories against simulations 13 with emphasis on heterogeneous, autonomous actors with bounded, spatial information 17. They provide a framework to understand complex behaviors like those of economic systems 18,19 , as well as individual 20 and organizational 21,22 decision making processes. These ...