The accurate and efficient detection of shoppers and classification by age group, gender, and cart or trolley in supermarkets is essential for strategic retail planning. With the advent of deep learning algorithms, various models based on Convolutional Neural Networks (CNNs) have been proposed for object detection in high-resolution spatial images. In this article, we proposed an architecture that consists of two phases: a first phase of shopper detection where we conducted a comparative study of three well-established CNN-based models, namely Single Shot Multi-Box Detector (SSD), You Look Only Once-v8 (YOLOv8), and Faster R-CNN, to detect shoppers by age group, gender, and shopping basket. Transfer learning and fine-tuning approaches were implemented to train the models. The evaluation results for accuracy and efficiency show that YOLOv8 achieved the best performance in terms of mean Average Precision (mAP), Frames Per Second (FPS) metrics, and visual inspection. SSD demonstrated an advantage in terms of detection speed with an FPS twice as high as Faster R-CNN, although their mAP was close on the test set. The trained models were also applied to two independent test sets, proving their transferability and the importance of higher resolution images for accuracy improvement. In the second phase, we conducted a comparative study among the classification models Residual Network 50 (ResNet50), the Visual Geometry Group(VGG) 16 and 19 family, and Densely Connected Convolutional Network 121 (DenseNet121). We found a positive prediction rate of 99.28% for the VGG16 model and 98.55% for VGG19, with the others having a rate that is not too far off.