Random Forests (RF) are a very widely used modelling tool. Lundberg et al. (2019) concludes that no nonlinear model had a more widespread popularity, from health care to academia to industry, than random forests and decision trees. The bounds of the ethodology are still being extended. Bayat et al. (2020) give an example with 80 million variables. It is highly desirable that RF models be made more interpretable and a large part of that is a better understanding of the characteristics of the variable importance measures generated by the RF. Due to its speed and ease of calculation, we consider the mean decrease in node "impurity" (MDI) variable importance (VI) and address the question of setting a significance level. The report is organized as follows:
• We first consider the question of multiple testing in the case of multiple measurements made on two groups (the standard microarray set up Efron (2008)). We show that some standard approaches for multiple testing can fail severely due to the correlation structure of the measurement and other modelling failures. We show that deriving the null distribution by permutation does not fix the problem. This point applies to determining the null distribution of variable importances as well as many other statistical tests;
• We show that variable correlation can either increase or decrease the (MDI) of variables in different settings. We also show that there is an additional problem with the permutation null due to the functional relationships between the statistics;
• We consider the empirical Bayes argument of Efron (2005) and model the VI as a mixture of two distribution, a null and a non-null distribution. We find that unlike the relatively well behaved case considered in Efron's papers, there are a number of issues here:
– the distribution may be multi-modal, which creates modelling difficulties;
– the null distribution is not of an obvious form, as it is not symmetric.
• We resolve these issues to derive a fast, plausible, empirical Bayes method for selecting significant variables while controlling the false discovery rate.