“…In the past, attempts to solely rely on language models for polymer property prediction tasks were hindered by the scarcity and unattainability of high-quality labeled polymer datasets, 37 while the availability of high-quality open-source polymer datasets is steadily increasing. [38][39][40][41] More encouragingly, extensive work has shown that data augmentationbased approaches are effective in addressing the scarcity of polymer data, 15,42,43 and harnessing the intelligence of general language models proves benecial for comprehending scientic language via language models. [44][45][46][47] To the best of our knowledge, a completely end-to-end language-based approach for directly predicting the properties of polymers from natural and chemical languages, rather than being used as intermediates to connect molecular structures to downstream models, is currently lacking.…”