We use visual image motion to judge the movement of objects, as well as our own movements through the environment. Generally, image motion components caused by object motion and self-motion are confounded in the retinal image. Thus, to estimate heading, the brain would ideally marginalize out the effects of object motion (or vice versa), but little is known about how this is accomplished neurally. Behavioral studies suggest that vestibular signals play a role in dissociating object motion and self-motion, and recent computational work suggests that a linear decoder can approximate marginalization by taking advantage of diverse multisensory representations. By measuring responses of MSTd neurons in two male rhesus monkeys and by applying a recently-developed method to approximate marginalization by linear population decoding, we tested the hypothesis that vestibular signals help to dissociate self-motion and object motion. We show that vestibular signals stabilize tuning for heading in neurons with congruent visual and vestibular heading preferences, whereas they stabilize tuning for object motion in neurons with discrepant preferences. Thus, vestibular signals enhance the separability of joint tuning for object motion and self-motion. We further show that a linear decoder, designed to approximate marginalization, allows the population to represent either self-motion or object motion with good accuracy. Decoder weights are broadly consistent with a readout strategy, suggested by recent computational work, in which responses are decoded according to the vestibular preferences of multisensory neurons. These results demonstrate, at both single neuron and population levels, that vestibular signals help to dissociate self-motion and object motion.