2022
DOI: 10.1039/d1me00160d
|View full text |Cite
|
Sign up to set email alerts
|

Featurization strategies for polymer sequence or composition design by machine learning

Abstract: In this work, we present, evaluate, and analyze strategies for representing polymer chemistry to machine learning models for the advancement of data-driven sequence or composition design of macromolecules.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
88
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 50 publications
(90 citation statements)
references
References 90 publications
1
88
0
Order By: Relevance
“…All copolymers were featurized as DP‐explicit composition vectors with one‐hot encoding vectors used as fingerprints for monomer units. [ 36 ] With eight possible monomers, the resulting feature vector possesses nine dimensions, with the first containing the DP of the copolymer divided by 200 and the remaining eight containing the fractions of incorporation for each monomer; the division in the first dimension represents DP on a similar scale as the remaining features. Gaussian process regression (GPR) models, trained to predict the Yeo–Johnson transformation [ 51 ] of the REA for a PPH, were preferred due to their superior predictive performance compared to other ML algorithms (Figure , Supporting Information).…”
Section: Methodsmentioning
confidence: 99%
“…All copolymers were featurized as DP‐explicit composition vectors with one‐hot encoding vectors used as fingerprints for monomer units. [ 36 ] With eight possible monomers, the resulting feature vector possesses nine dimensions, with the first containing the DP of the copolymer divided by 200 and the remaining eight containing the fractions of incorporation for each monomer; the division in the first dimension represents DP on a similar scale as the remaining features. Gaussian process regression (GPR) models, trained to predict the Yeo–Johnson transformation [ 51 ] of the REA for a PPH, were preferred due to their superior predictive performance compared to other ML algorithms (Figure , Supporting Information).…”
Section: Methodsmentioning
confidence: 99%
“…Stoichiometry can be considered by taking the sum of monomer ngerprints, weighted by the respective ratios. 32,57 Alternatively, count ngerprints, which use vectors of integer values and capture the frequency of different chemical patterns, can be applied to oligomeric molecules constructed in a way to reect the monomers' stoichiometry. By constructing a short polymer chain, the resulting count ngerprints also capture aspects of the polymer's chain architecture.…”
Section: Prior Work On Polymer Representations As Model Baselinesmentioning
confidence: 99%
“…88 Under the context of known polymer sequence or composition information, further featurization strategies for copolymer systems also include one-hot encoding of constitutional units, molecular fingerprints and descriptor vectors. 89 The monomer structure of polymers can be modeled by molecular graphs comprising sets of nodes (atoms) and edges (chemical bonds) before the training of the graph convolutional network. 90…”
Section: Future Directionsmentioning
confidence: 99%