“…In [11] the optimal number of mixture components is learnt using a group-sparsity inducing norm. An important difference of our 65 approach compared to these works is that we model the grouping of instances as a continuous soft-assignment instead of a discrete labeling.…”