“…The best‐fitting substitution model across all partitions for the nucleotide dataset was a general time‐reversible substitution model (GTR; Tavaré, ) with rate heterogeneity described by a gamma distribution discretized into four bins (+G; Yang, ) and a proportion of invariant sites (+I, Fitch & Margoliash, ). We did not use the GTR + I + G mixture model (Gu et al ., ; Waddell & Steel, ) because this approach has been highly criticized on both empirical and theoretical grounds (Yang, , , ; Sullivan et al ., ; Mayrose et al ., ; Jia et al ., ). Studies indicate that some of the parameters of the +I and +G models cannot be optimized independently of each other (Yang, , ; Jia et al ., ); indeed, the estimated proportion of invariable sites was demonstrated to be highly susceptible to changes in the number of gamma rate categories of the +G model (Jia et al ., ).…”