We apply a recently developed unsupervised machine learning scheme for local environments [Reinhart, Computational Materials Science, 2021, 196, 110511] to characterize large-scale, disordered aggregates formed by sequence-defined macromolecules. This method...
Self-assembly of dilute sequence-defined macromolecules is a complex phenomenon in which the local arrangement of chemical moieties can lead to the formation of long-range structure. The dependence of this structure...
Self-assembly of dilute sequence-defined macromolecules is a complex phenomenon in which the local arrangement of chemical moieties leads to the formation of a long-range structure. The dependence of this structure on the sequence necessarily implies that a mapping between the two exists, yet it has been difficult to model so far. Predicting the aggregation behavior of these macromolecules is challenging due to the lack of effective order parameters, a vast design space, inherent variability, and high computational costs associated with currently available simulation techniques. Here, we accurately predict the morphology of aggregates self-assembled from sequence-defined macromolecules using supervised machine learning. We find that regression models with implicit representation learning perform significantly better than those based on engineered features such as k-mer counting, and a Recurrent-Neural-Network-based regressor performs the best. Further, we demonstrate the high-throughput screening of monomer sequences using the regression model to identify candidates for self-assembly into selected morphologies. Our strategy is shown to successfully identify multiple suitable sequences in every test we performed, so we hope the insights gained here can be extended to other increasingly complex design scenarios in the future, such as the design of sequences under polydispersity and at varying environmental conditions.
We apply a recently developed unsupervised machine learning scheme for local atomic environments 1 to characterize large-scale, disordered aggregates formed by sequencedefined macromolecules. This method provides new insight into the structure of these disordered, dilute aggregates, which has proven difficult to understand using collective variables manually derived from expert knowledge. 2 In contrast to such conventional order parameters, we are able to classify the global aggregate structure directly using descriptions of the local environments. The resulting characterization provides a deeper understanding of the range of possible self-assembled structures and their relationships to each other. We also provide detailed analysis of the effects of finite system size, stochasticity, and kinetics of these aggregates based on the learned collective variables. Interestingly, we find that the spatiotemporal evolution of systems in the learned latent space is smooth and continuous, despite being derived from only a single snapshot from each of about 1 000 monomer sequences. These results demonstrate the insight which can be gained by applying unsupervised machine learning to soft matter systems, especially when suitable order parameters are not known.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.