The Fast Multipole Boundary Element Method (FMBEM) reduces the [Formula: see text] computational and memory complexity of the conventional BEM discretized with [Formula: see text] boundary unknowns, to [Formula: see text] and [Formula: see text], respectively. A number of massively parallel FMBEM models have been developed in the last decade or so for CPU, GPU and heterogeneous architectures, which are capable of utilizing hundreds of thousands of CPU cores to treat problems with billions of degrees of freedom (dof). On the opposite end of this spectrum, small-scale parallelization of the FMBEM to run on the typical workstation computers available to many researchers allows for a number of simplifications in the parallelization strategy. In this paper, a novel parallel broadband Helmholtz FMBEM model is presented, which utilizes a simple columnwise distribution scheme, element reordering and rowwise compression of data, to parallelize all stages of the fast multipole method (FMM) algorithm with a minimal communication overhead. The sparse BEM near-field and sparse approximate inverse preconditioner are also constructed and executed in parallel, while the flexible generalized minimum residual (fGMRES) solver has been modified to apply the FMBEM matrix-vector products and corresponding minimum residual convergence within the parallel environment. The algorithmic and memory complexities of the resulting parallel FMBEM model are shown to reaffirm the above estimates for both the serial and parallel configurations. The parallel efficiency (PE) of the FMBEM matrix-vector products and fGMRES solution for the present model is shown to be satisfactory; achieving PEs up to [Formula: see text] and [Formula: see text] in the fGMRES solution using 3 and 6 CPU cores respectively, when applied to models having [Formula: see text] dof per CPU core. The PE of the precalculation stages of the FMBEM — in particular the FMM precomputation stage which is largely unparallelized — reduces the overall PE of the FMBEM model; resulting in average efficiencies of [Formula: see text] and [Formula: see text] for the 3-core and 6-core models when treating problems with [Formula: see text] dof per CPU core. The present model is able to treat large-scale acoustic scattering problems involving up to [Formula: see text] dof on a workstation computer equipped with 128[Formula: see text]GB of RAM, while acoustic target strength (TS) results calculated up to 3[Formula: see text]kHz for the BeTSSi II submarine model demonstrate its capabilities for large-scale TS modeling.