“…After that, it follows the score function of the joint posterior density of perturbed variables, starting with high σ 1,X and σ 1,H , progressively reducing to σ L,X ≈ σ L,H ≈ 0. At early noise levels, the likelihood term directs the dynamics toward an estimate mainly driven by the measurements, while in later noise levels, the prior refines the estimate, as explained further in [13]. Annealing benefits are threefold: it is used to train the score network via score-matching, it enhances dynamic mixing, and it allows for discrete-to-continuous variable approximation.…”