“…Most of the current methods [17], [18], [18] approximate the architecture depth posterior with VI based on the meanfield assumption [8], [9], where the neural weights and depth variables are independent. The mean-field assumption can limit the approximation fidelity and introduce the rich-getricher problem, i.e., the shallow networks would dominate the search.…”