Human visual cortex is organized into regions that respond preferentially to different categories of objects (i.e. faces, bodies, artifacts, scenes). However, often people need to integrate information about objects from different categories to make inferences about the world. How does the brain integrate information represented in different category-selective regions? In this work, we investigated this question taking advantage of a new analysis approach. Using artificial neural networks, we modeled the multivariate statistical dependence between fMRI responses in different brain regions. Regions whose responses were predicted significantly better by a combination of multiple category-selective regions than by the best-predicting category-selective region taken individually were identified as integration hubs. We used this approach to analyze fMRI responses to complex dynamic stimuli (the movie Forrest Gump), and identified five integration hubs: 1) the posterior medial thalamus, 2) the middle cingulate gyrus, 3) the posterior cingulate gyrus, 4) the angular gyrus, and 5) the cerebellum. Hubs were identified robustly across different artificial neural network architectures. Furthermore, representational similarity analysis revealed that, unlike in category-selective regions, representational geometry in integration hubs is not driven by the animate/inanimate distinction. These results indicate that a small set of localized regions integrates visual information about different object categories, and suggests that integration across multiple categories leads to a transformation of the similarity structure of neural representations.