A key issue that continues to generate controversy concerns the nature of the psychological, computational, and neural mechanisms that support the visual recognition of objects such as faces and words. While some researchers claim that visual recognition is accomplished by category-specific modules dedicated to processing distinct object classes, other researchers have argued for a more distributed system with only partially specialized cortical regions. Considerable evidence from both functional neuroimaging and neuropsychology would seem to favour the modular view, and yet close examination of those data reveals rather graded patterns of specialization that support a more distributed account. This paper explores a theoretical middle ground in which the functional specialization of brain regions arises from general principles and constraints on neural representation and learning that operate throughout cortex but that nonetheless have distinct implications for different classes of stimuli. The account is supported by a computational simulation, in the form of an artificial neural network, that illustrates how cooperative and competitive interactions in the formation of neural representations for faces and words account for both their shared and distinctive properties. We set out a series of empirical predictions, which are also examined, and consider the further implications of this account.