Particle Swarm Optimization (PSO) has become a popular method of feature selection in classification problems, due to its powerful search capability and computational simplicity. Classification problems, such as facial emotion recognition, often involve data sets containing high volumes of features, not all of which are useful for classification. Redundant and irrelevant features have the potential to negatively impact the performance and accuracy of facial emotion recognition systems. The feature selection process identifies the most relevant features to achieve improved classification performance. While the use of PSO as a feature selection method in facial emotion recognition systems has seen some successes, it is still susceptible to the issue of premature convergence. This work presents seven PSO variants which mitigate against the premature convergence problem through the incorporation of three random probability distributions (Cauchy, Gaussian and Lévy). At each iteration of the proposed PSO models, probability distributions are used to increase search diversity and reduce the number of redundant features used for classification. The seven PSO variants presented in this study have demonstrated positive results when tested on real world data sets, outperforming the standard PSO model and other related work within the field.