A fully parallel version of the Contact Dynamics (CD) method is presented in this paper. For large enough systems, 100% efficiency has been demonstrated for up to 256 processors using a hierarchical domain decomposition with dynamic load balancing. The iterative scheme to calculate the contact forces is left domain-wise sequential, with data exchange after each iteration step, which ensures its stability. The number of additional iterations required for convergence by the partially parallel updates at the domain boundaries becomes negligible with increasing number of particles, which allows for an effective parallelization. Compared to the sequential implementation, we found no influence of the parallelization on simulation results. decomposition leads to causality problems. The algorithm presented in [15] conserves causality by reverting to an older state when violated. The best efficiency reached so far is a speedup proportional to the square root of the number of processors [15].In contrast to ED, lasting contacts between rigid bodies are considered in the realm of (multi)-rigid-body dynamics. Common to all its realizations is the treatment of contact forces as constraint forces, preventing interpenetration and, to a certain extent in the case of frictional contacts, sliding. When applying the rigid body modelling to problems like e.g. robotics [16,17] or granular media [18,19,20,21], different algorithms can in principle be used. Approximations with respect to the constraint of dry Coulomb friction enable the usage of powerful standard techniques for linear complementary problems (LCP) [22]. Other algorithms keep the isotropic friction cone, using a solver based on a modified time stepping scheme leading to a cone complementary problem (CCP) for the simulation of frictional contact dynamics [23]. Other approximations, leading to fast frictional dynamics (FFD) [24], yield a computational cost being only linear in the number of contacts and thus allow for impressive system sizes in terms of the number of particles. For investigations of e.g. the stress field in granular media, these approximations are prohibitive, though, and thus the non-smooth contact dynamics method [7], or commonly just contact dynamics, is widely employed. We will sketch the principle of this iterative procedure in section 2.1. Parallelization of the FFD method is straightforward and efficient [25,26], on the other hand, the parallel version suffers also from the undesired approximations. The parallel implementation of the CCP algorithm by the use of the Graphics Processing Unit (GPU) for large-scale multibody dynamics simulations is presented in [27]. In the present work we investigate the impact of the parallelization on the numerical solution of the CD method going beyond [25,26,27].Providing a parallel CD code is motivated by the need for large-scale simulations of dense granular systems of hard particles. The computation time even scales as O(N 1+2/d ) with the number of particles in CD [8] (d is the dimension of the system), while it...