The High-Performance Computing (HPC) community has recently started to use containerization to obtain fast, customized, portable, flexible, and reproducible deployments of their workloads. Previous work showed that deploying an HPC workload into a single container can keep bare-metal performance. However, there is a lack of research on multi-container deployments that partition the processes belonging to each application into different containers. Partitioning HPC applications have shown to improve their performance on virtual machines by allowing them to be set affinity to a NUMA (Non-Uniform Memory Access) domain. Consequently, it is essential to understand the performance implications of distinct multi-container deployment schemes for HPC workloads, focusing on the impact of the container granularity and its combination with processor and memory affinity. This paper presents a systematic performance comparison and analysis of multi-container deployment schemes for HPC workloads on a single-node platform, which considers different containerization technologies (including Docker and Singularity), two different platform architectures (UMA and NUMA), and two application subscription modes (exactly-subscription and over-subscription). Our results indicate that finer-grained multi-container deployments, on one side, can benefit the performance of some applications with low inter-process communication, especially in over-subscribed scenarios and when combined with affinity but, on the other side, they can incur some performance degradation for communicationintensive applications when using containerization technologies that deploy isolated network namespaces.