We present a mathematical analysis of the Wang-Landau algorithm, prove its convergence, identify sources of errors and strategies for optimization. In particular, we found the histogram increases uniformly with small fluctuation after a stage of initial accumulation, and the statistical error is found to scale as √ ln f with the modification factor f . This has implications for strategies for obtaining fast convergence.PACS numbers: 02.70. Tt, 02.70.Rr, 02.50.Fz, 02.50.Ey The Wang-Landau(WL) algorithm [1] has been applied to a number of interesting problems [1,2,3,4,5,6]. It overcomes some difficulties in other Monte Carlo algorithms such as critical slowing down, and long relaxation times due to frustration and complex energy terrain. Similar to the Metropolis algorithm, it is a generic algorithm, independent on the details of the physical system. Many methods have been suggested to improve the algorithm for certain types of systems [8,9,10]. The same mechanism also appears in the recent research of molecular dynamics simulations [11]. Among the studies to characterize and improve the efficiency of the algorithm, Dayal et al.[12] shows the WL algorithm considerably reduces the tunneling time, and Trebst et al. [7] proposed an algorithm that performs better in terms of tunneling time. However, the WL algorithm has been used as an empirical method. Many important questions still remain unanswered: (i) How is flatness of the histogram related to the accuracy; (ii) what is the relation between the modification factor and error; and (iii) how does the simulation actually find out the density of states? The convergence of the WL algorithm should be guaranteed by a generic principle, in the same sense as the detailed balance assures the convergence of the Metropolis algorithm. However, the WL algorithm is different from the Metropolis algorithm, since it is not a Markov process.In this paper we present our study of this algorithm from an analytical approach, and try to answer those questions raised above. Our analysis provides a proof of the convergence of the method, estimation for the errors and the computational time, along with some strategies for optimization and parallelization.The goal of the WL algorithm is to accumulate knowledge about ρ(E) during a Metropolis-type MC sampling. The Metropolis-type random walk is characterized by an acceptance ratio min{1, g(E j )/g(E i )}, where g(E) is a function of energy, similar to the Boltzman factor in the usual Metropolis algorithm. before and after this transition. the acceptance ratio biases the free random walk and produces a final histogram h(E), which is related to the equilibrium distribution of the unbiased random walk ρ(E) by ρ(E)g(E) = h(E), provided that both sides of the identity are normalized. This identity is essentially a result of detailed balance. The WL algorithm divides g(E j ) by a modification factor f after each transition, expecting g(E) to converge to 1/ρ(E) and the histogram h(E) to be flat.ρ(E) is a priori unknown in the simulation. We beg...