Auditory streaming and visual plaids have been used extensively to study perceptual organization in each modality. Both stimuli can produce bistable alternations between grouped (one object) and split (two objects) interpretations. They also share two peculiar features: (i) at the onset of stimulus presentation, organization starts with a systematic bias towards the grouped interpretation; (ii) this first percept has 'inertia'; it lasts longer than the subsequent ones. As a result, the probability of forming different objects builds up over time, a landmark of both behavioural and neurophysiological data on auditory streaming. Here we show that first percept bias and inertia are independent. In plaid perception, inertia is due to a depth ordering ambiguity in the transparent (split) interpretation that makes plaid perception tristable rather than bistable: experimental manipulations removing the depth ambiguity suppressed inertia. However, the first percept bias persisted. We attempted a similar manipulation for auditory streaming by introducing level differences between streams, to bias which stream would appear in the perceptual foreground. Here both inertia and first percept bias persisted. We thus argue that the critical common feature of the onset of perceptual organization is the grouping bias, which may be related to the transition from temporally/ spatially local to temporally/spatially global computation.