2023
DOI: 10.1021/acs.jcim.3c00144
|View full text |Cite
|
Sign up to set email alerts
|

Augmenting Polymer Datasets by Iterative Rearrangement

Abstract: One of the biggest obstacles to successful polymer property prediction is an effective representation that accurately captures the sequence of repeat units in a polymer. Motivated by the success of data augmentation in computer vision and natural language processing, we explore augmenting polymer data by iteratively rearranging the molecular representation while preserving the correct connectivity, revealing additional substructural information that is not present in a single representation. We evaluate the ef… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 61 publications
0
5
0
Order By: Relevance
“…In the past, attempts to solely rely on language models for polymer property prediction tasks were hindered by the scarcity and unattainability of high-quality labeled polymer datasets, 37 while the availability of high-quality open-source polymer datasets is steadily increasing. [38][39][40][41] More encouragingly, extensive work has shown that data augmentationbased approaches are effective in addressing the scarcity of polymer data, 15,42,43 and harnessing the intelligence of general language models proves benecial for comprehending scientic language via language models. [44][45][46][47] To the best of our knowledge, a completely end-to-end language-based approach for directly predicting the properties of polymers from natural and chemical languages, rather than being used as intermediates to connect molecular structures to downstream models, is currently lacking.…”
Section: Introductionmentioning
confidence: 99%
“…In the past, attempts to solely rely on language models for polymer property prediction tasks were hindered by the scarcity and unattainability of high-quality labeled polymer datasets, 37 while the availability of high-quality open-source polymer datasets is steadily increasing. [38][39][40][41] More encouragingly, extensive work has shown that data augmentationbased approaches are effective in addressing the scarcity of polymer data, 15,42,43 and harnessing the intelligence of general language models proves benecial for comprehending scientic language via language models. [44][45][46][47] To the best of our knowledge, a completely end-to-end language-based approach for directly predicting the properties of polymers from natural and chemical languages, rather than being used as intermediates to connect molecular structures to downstream models, is currently lacking.…”
Section: Introductionmentioning
confidence: 99%
“…First, it only considered homopolymers and excluded block, ladder, and copolymers, which have shown potential in gas separation applications. , Second, the dataset size derived from experimental gas permeability data was limited and contained a relatively small number of highly selective polymers, especially for CO 2 /N 2 separation, leading to less accurate ML models. Data augmentation techniques for polymers could help address this concern . The ML model fittings produced somewhat inexact polymer predictions, which could be addressed by validation through experiments or molecular simulations.…”
Section: Resultsmentioning
confidence: 99%
“…Data augmentation techniques for polymers could help address this concern. 128 The ML model fittings produced somewhat inexact polymer predictions, which could be addressed by validation through experiments or molecular simulations. Moreover, the created polymer datasets may not encompass the entire chemical space, and inverse design methods could be employed to mitigate this limitation.…”
Section: = | |mentioning
confidence: 99%
“…Polymers cannot be easily represented as the repeating, statistical entities that they actually are. While this is an active area of research, [31][32][33][34] we simply encoded the structure of the monomer. Additionally, regioregularitywhich is a factor for both polymers and NFAsis not easily represented in SMILES notation.…”
Section: Data Curationmentioning
confidence: 99%