Here we present our multi-core architectures for object detection. We move away from the traditional architecture of Multi-Processors (MPs) by using cacheable accesses to main memory to create atomic cores and utilising local memory for all program data. Main memory is partitioned through software into dedicated data regions to allow atomic accesses by cores, without the need for synchronisation primitives. In doing this, we demonstrate how multi-threading techniques such as Interleaved Task Reordering (ITR) can be utilised to balance the processing loads on available cores. We implement and test up to 7 soft-cores with the Viola Jones face detection algorithm and achieve a performance increase of up to 9.14x with a 100% detection rate: surpassing the theoretical performance increase of multi-core processors for all designs and test images. Furthermore, we surpass the performance increases of multi-core implementations from the literature, thus proving our custom designs to be a more viable solution for multi-core object detection applications. Finally, resource and power consumption estimates indicate our designs to be suitable for embedded systems deployment.