A big data framework to validate thermodynamic data for chemical species

Buerger, Philipp; Akroyd, Jethro; Martin, Jacob W.; Kraft, Markus

doi:10.1016/j.combustflame.2016.11.006

Cited by 9 publications

(51 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our method is benchmarked on three different sets of data: (1)∼7000 molecules consisting of C,O,N,S,H from the QM7 database [42,43], (2) 920 hydrocarbon species from NIST chemistry webbook [44,45] , and (3) 591 surface intermediates on a transition metal facet [46]. The QM7 dataset contains atomization energies (computed using density functional theory, or DFT) of molecules with up to seven (7) non-H atoms.…”

Section: Datasetsmentioning

confidence: 99%

Designing compact training sets for data-driven molecular property prediction through optimal exploitation and exploration

Rangarajan

2019

Mol. Syst. Des. Eng.

View full text Add to dashboard Cite

In this paper, we consider the problem of designing a training set using the most informative molecules from a specified library to build data-driven molecular property models. Specifically, we use (i) sparse generalized group additivity and (ii) kernel ridge regression as two representative classes of models, we propose a method combining rigorous model-based design of experiments and cheminformatics-based diversity-maximizing subset selection within the -greedy framework to systematically minimize the amount of data needed to train these models. We demonstrate the effectiveness of the algorithm on subsets of various databases, including QM7, NIST, and a catalysis dataset. For sparse group additive models, a balance between exploration (diversity-maximizing selection) and exploitation (D-optimality selection) leads to learning with a fraction (sometimes as little as 15%) of the data to achieve similar accuracy as five-fold cross validation on the entire set. On the other hand, kernel ridge regression prefers diversity-maximizing selections. arXiv:1906.10273v1 [physics.data-an]

show abstract

Section: Datasetsmentioning

confidence: 99%

Designing compact training sets for data-driven molecular property prediction through optimal exploitation and exploration

Rangarajan

2019

Mol. Syst. Des. Eng.

View full text Add to dashboard Cite

show abstract

“…The isodesmic and isogyric reaction classes were used in this work [44,45,46]. This methodology has been extensively tested and validated for test data sets including carbon, hydrogen, oxygen, chlorine and titanium [47,48].…”

Section: Enthalpy Correctionmentioning

confidence: 99%

“…The effect of the reference data (Table 1) on the accuracy of the method was assessed using a cross-validation technique. The method was described in full elsewhere [47,48] and is only summarised here. The standard enthalpy of formation was iteratively estimated for each species in the reference set, assuming that the enthalpy of the species under investigation is unknown.…”

Section: Enthalpy Correctionmentioning

confidence: 99%

See 1 more Smart Citation

Extended first-principles thermochemistry for the oxidation of titanium tetrachloride

Buerger

Akroyd

Kraft

2019

Combustion and Flame

Self Cite

View full text Add to dashboard Cite

show abstract

“…Ground state geometries and vibrational frequencies for all species used in this work were calculated using density functional theory (DFT) at the B97-1/6-311+G(d,p) level of theory, as per previous works [54,56,5,67]. This functional has shown to be accurate [68,69] and well suited for transition metal complexes [70,71,72].…”

Section: Electronic Structure Calculationsmentioning

confidence: 99%

A systematic method to estimate and validate enthalpies of formation using error-cancelling balanced reactions

Buerger

Akroyd

Mosbach

et al. 2018

Combustion and Flame

Self Cite

View full text Add to dashboard Cite

show abstract

A big data framework to validate thermodynamic data for chemical species

Cited by 9 publications

References 51 publications

Designing compact training sets for data-driven molecular property prediction through optimal exploitation and exploration

Designing compact training sets for data-driven molecular property prediction through optimal exploitation and exploration

Extended first-principles thermochemistry for the oxidation of titanium tetrachloride

A systematic method to estimate and validate enthalpies of formation using error-cancelling balanced reactions

Contact Info

Product

Resources

About