The condent high-throughput identication of small molecules remains one of the most challenging tasks in mass spectrometry-based metabolomics. SIRIUS has become a powerful tool for the interpretation of tandem mass spectra, and shows outstanding performance for identifying the molecular formula of a query compound, being the rst step of structure identication. Nevertheless, the identication of both molecular formulas for large compounds above 500 Daltons and novel molecular formulas remains highly challenging. Here, we present ZODIAC, a network-based algorithm for the de novo estimation of molecular formulas. ZODIAC reranks SIRIUS' molecular formula candidates, combining fragmentation tree computation with Bayesian statistics using Gibbs sampling. Through careful algorithm engineering, ZODIAC's Gibbs sampling is very swift in practice. ZODIAC decreases incorrect annotations 16.2-fold on a challenging plant extract dataset with most compounds above 700 Dalton; we then show improvements on four additional, diverse datasets. Our analysis led to the discovery of compounds with novel molecular formulas such as C 24 H 47 BrNO 8 P which, as of today, is not present in any publicly available molecular structure databases.Ludwig, Nothias, Dührkop, et al. best performance 912 . One reason for CSI:FingerID's improved performance is the integration of SIRIUS 10 , deducing the molecular formula of each query as the rst step of its analysis. Other tools lter candidates using the query precursor mass, reducing molecular formula annotation to a byproduct. This worsens identication rates 11 and can result in severe hidden prior problems 13,14 .Identifying the molecular formula is also the very rst step in structural elucidation using Nuclear Magnetic Resonance (NMR) or X-ray crystallography, guiding data interpretation based on atoms and unsaturation degree. The condent annotation of molecular formulas from mass spectrometry data is far from trivial, especially if executed de novo (without a structure database): Here, the number of candidate molecular formula grows rapidly with the compound size and elements beyond CHNOPS. To counter this growth, one can use heuristic constraints 15 or use only molecular formulas from some structure database 16,17 . Restricting the search space will improve the performance of a method in evaluation, but will prevent the discovery of novel molecular formulas in application.Arguably the best-performing computational method for molecular formula annotation is SIRIUS 4 10 , which combines isotope pattern matching 15,1823 and MS/MS fragmentation tree computation 22,2426 . SIRIUS reaches best-of-class performance without ltering or meta-scores 24 .But even SIRIUS has problems annotating molecular formula for compounds above 500 Da:Böcker & Dührkop 24 found that the percentage of correctly identied molecular formulas dropped substantially for larger masses.An alternative approach to annotate molecular formulas for a complete LC-MS run uses Gibbs sampling and Bayesian statistics, utilizing co-occurrence ...