Augmenting Polymer Datasets by Iterative Rearrangement

Lo, Siu-Ming; Seifrid, Martin; Gaudin, Théophile; Aspuru‐Guzik, Alán

doi:10.1021/acs.jcim.3c00144

Cited by 6 publications

(5 citation statements)

References 61 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the past, attempts to solely rely on language models for polymer property prediction tasks were hindered by the scarcity and unattainability of high-quality labeled polymer datasets, 37 while the availability of high-quality open-source polymer datasets is steadily increasing. [38][39][40][41] More encouragingly, extensive work has shown that data augmentationbased approaches are effective in addressing the scarcity of polymer data, 15,42,43 and harnessing the intelligence of general language models proves benecial for comprehending scientic language via language models. [44][45][46][47] To the best of our knowledge, a completely end-to-end language-based approach for directly predicting the properties of polymers from natural and chemical languages, rather than being used as intermediates to connect molecular structures to downstream models, is currently lacking.…”

Section: Introductionmentioning

confidence: 99%

PolyNC: a natural and chemical language model for the prediction of unified polymer properties

Qiu,

Liu,

Qiu

et al. 2024

Chem. Sci.

View full text Add to dashboard Cite

show abstract

Section: Introductionmentioning

confidence: 99%

PolyNC: a natural and chemical language model for the prediction of unified polymer properties

Qiu,

Liu,

Qiu

et al. 2024

Chem. Sci.

View full text Add to dashboard Cite

show abstract

“…First, it only considered homopolymers and excluded block, ladder, and copolymers, which have shown potential in gas separation applications. ,− Second, the dataset size derived from experimental gas permeability data was limited and contained a relatively small number of highly selective polymers, especially for CO 2 /N 2 separation, leading to less accurate ML models. Data augmentation techniques for polymers could help address this concern . The ML model fittings produced somewhat inexact polymer predictions, which could be addressed by validation through experiments or molecular simulations.…”

Section: Resultsmentioning

confidence: 99%

“…Data augmentation techniques for polymers could help address this concern. 128 The ML model fittings produced somewhat inexact polymer predictions, which could be addressed by validation through experiments or molecular simulations. Moreover, the created polymer datasets may not encompass the entire chemical space, and inverse design methods could be employed to mitigate this limitation.…”

Section: = | |mentioning

confidence: 99%

Creation of Polymer Datasets with Targeted Backbones for Screening of High-Performance Membranes for Gas Separation

Tiwari,

Shi,

Budhathoki

et al. 2024

J. Chem. Inf. Model.

View full text Add to dashboard Cite

A simple approach was developed to computationally construct a polymer dataset by combining simplified molecular-input line-entry system (SMILES) strings of a targeted polymer backbone and a variety of molecular fragments. This method was used to create 14 polymer datasets by combining seven polymer backbones and molecules from two large molecular datasets (MOSES and QM9). Polymer backbones that were studied include four polydimethylsiloxane (PDMS) based backbones, poly(ethylene oxide) (PEO), poly(allyl glycidyl ether) (PAGE), and polyphosphazene (PPZ). The generated polymer datasets can be used for various cheminformatics tasks, including high-throughput screening for gas permeability and selectivity. This study utilized machine learning (ML) models to screen the polymers for CO 2 /CH 4 and CO 2 /N 2 gas separation using membranes. Several polymers of interest were identified. The results highlight that employing an ML model fitted to polymer selectivities leads to higher accuracy in predicting polymer selectivity compared to using the ratio of predicted permeabilities.

show abstract

“…Polymers cannot be easily represented as the repeating, statistical entities that they actually are. While this is an active area of research, [31][32][33][34] we simply encoded the structure of the monomer. Additionally, regioregularitywhich is a factor for both polymers and NFAsis not easily represented in SMILES notation.…”

Section: Data Curationmentioning

confidence: 99%

Beyond molecular structure: critically assessing machine learning for designing organic photovoltaic materials and devices

Seifrid,

Lo,

Choi

et al. 2024

J. Mater. Chem. A

Self Cite

View full text Add to dashboard Cite

show abstract

Augmenting Polymer Datasets by Iterative Rearrangement

Cited by 6 publications

References 61 publications

PolyNC: a natural and chemical language model for the prediction of unified polymer properties

PolyNC: a natural and chemical language model for the prediction of unified polymer properties

Creation of Polymer Datasets with Targeted Backbones for Screening of High-Performance Membranes for Gas Separation

Beyond molecular structure: critically assessing machine learning for designing organic photovoltaic materials and devices

Contact Info

Product

Resources

About