“…In Table 1, for each reproducible paper, we identified the recommendation task (RP : Rating Prediction; TR : Top-N Recommendation), the notion of consumer fairness (EQ : equity of the error/utility score across demographic groups; IND : independence of the predicted relevance scores or recommendations from the demographic group), the consumers' grouping (G : Gender, A : Age, O : Occupation, B : Behavioral), the mitigation type (PRE-, IN-or POST-Processing), the evaluation data sets (ML : MovieLens 1M or 10M, LFM : LastFM 1K or 360K, AM: Amazon, SS: Sushi, SY: Synthetic), the utility/accuracy metrics (NDCG : Normalized Discounted Cumulative Gain; F1 : F1 Score; AUC: Area Under Curve; MRR : Mean Reciprocal Rank; RMSE : Root Mean-Square Error; MAE : Mean Ab- We identified [26,27,20,25] and [4,17,14] as non-reproducible procedures according to our criteria for top-n recommendation and rating prediction, respectively.…”