Particle simulations are important workloads in high performance and parallel computing. Due to massive particle migration between parallel processes during simulations, efficient and balanced parallel computing is a practical challenge in large-scale realistic particle applications. This paper proposes a novel approach to enable highly-efficient dynamic load balance in a coupled DSMC/PIC solver for large-scale numerical simulations of the plasma plume. We employ dual unstructured grids of different granularity, with a coarse grid for DSMC simulations of flow fields and an embedded fine grid for PIC simulations of electric fields, to facilitate coupled DSMC/PIC calculation and grid partition for parallel computing. We then design and implement a centralized as well as a distributed communication strategies to dynamically migrate particles among arbitrary parallel processes. During the timestep iterations, we present a lightweight dynamic load balancer, composed of a load imbalance factor, a weighted load model and an efficient grid remapping mechanism, to adaptively rebalance the simulation among parallel processes with little extra overheads. We perform 3D unsteady simulations of the plasma plume induced by the pulsed vacuum arc in a cylindrical nozzle with hydrogen atoms and ions to validate the coupled solver. Parallel performance results scaling up to thousands of processes with billions of particles demonstrate the efficiency and effectiveness of our dynamic load balancer and parallel implementation.