Crystal Transformer: Self-learning neural language model for Generative and Tinkering Design of Materials

Wei, Lai; Li, Qinyang; Song, Yuqi; Stefanov, Stanislav; Siriwardane, Edirisuriya M. Dilanga; Chen, Fanglin; Hu, Jianjun

doi:10.48550/arxiv.2204.11953

Cited by 4 publications

(8 citation statements)

References 51 publications

(71 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While the Hybrid-mix dataset may contain a certain amount of materials that are not charge neutral or balanced electronegativity (EB), the Hybrid-pure dataset has samples selected from the Hybrid-mix dataset, which are charge neutral and have EB. The Hybrid-strict dataset is obtained using a similar method to the Hybrid-pure dataset, but we use the strict ICSD oxidation states of the elements to calculate the charge neutrality (CN) and EB, which is different (with more strict constraints) from the Hybrid-pure dataset and the dataset used in our previous study [31].…”

Section: Datasetmentioning

confidence: 99%

“…The group includes four GPT series LMs, BART [17], and RoBERTa [46]. In addition, we also use our previous work, BLMM [31], in our experiments to show the performance.…”

Section: Pretrain Transformer Lms For Materials Composition Generationmentioning

confidence: 99%

“…We call this BART-based materials composition generation model as MT-BART. • BLMM [31] is a transformer-based generative LM for materials composition generation, which is based on the blank filling LM BLM [45]. It formulates the composition generation problem as a sequential probabilistic sequence rewriting problem, which allows it to directly model the generation process enabling its high interpretability and high efficiency in generation.…”

Section: Pretrain Transformer Lms For Materials Composition Generationmentioning

confidence: 99%

“…Despite the success of deep LMs in the protein and molecule sequence generation, no studies have been reported successfully applied deep LMs to the inorganic materials composition generation except for our recent work on generative transformers [31]. Here, we develop and evaluate six materials transformers based on different LMs for the materials composition generation.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Material transformers: deep learning language models for generative materials design

Wei

Song

et al. 2023

Mach. Learn.: Sci. Technol.

View full text Add to dashboard Cite

Pre-trained transformer language models on large unlabeled corpus have produced state-of-the-art results in natural language processing, organic molecule design, and protein sequence generation. However, no such models have been applied to learn the composition patterns for generative design of material compositions. Here we train a series of seven modern transformer models (GPT, GPT-2, GPT-Neo, GPT-J, BLMM, BART, and RoBERTa) for materials design using the expanded formulas of the ICSD, OQMD, and Materials Projects databases. Six different datasets with/out non-charge-neutral or balanced electronegativity samples are used to benchmark the generative design performances and uncover the biases of modern transformer models for the generative design of materials compositions. Our experiments show that the materials transformers based on causal language models can generate chemically valid materials compositions with as high as 97.54\% to be charge neutral and 91.40\% to be electronegativity balanced, which has more than six times higher enrichment compared to the baseline pseudo-random sampling algorithm. Our language models also demonstrate high generation novelty and their potential in new materials discovery is proved by their capability to recover the leave-out materials. We also find that the properties of the generated compositions can be tailored by training the models with selected training sets such as high-bandgap samples. Our experiments also show that different models each have their own preference in terms of the properties of the generated samples and their running time complexity varies a lot. We have applied our materials transformers to discover a set of new materials as validated using DFT calculations. All our trained materials transformer models and code can be accessed freely at \url{http://www.github.com/usccolumbia/MTransformer}.

show abstract

Section: Datasetmentioning

confidence: 99%

“…The group includes four GPT series LMs, BART [17], and RoBERTa [46]. In addition, we also use our previous work, BLMM [31], in our experiments to show the performance.…”

Section: Pretrain Transformer Lms For Materials Composition Generationmentioning

confidence: 99%

Section: Pretrain Transformer Lms For Materials Composition Generationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Material transformers: deep learning language models for generative materials design

Wei

Song

et al. 2023

Mach. Learn.: Sci. Technol.

View full text Add to dashboard Cite

show abstract

“…[27] proposed a blank language model (BLM) which could generate sequences by dynamically creating and filling in blanks. Our BLMM composition generator [28] is developed based on the BLM blank-filling model. All material formulas can be rewritten as sequences (e.g., SiTiO 3 to Si Ti O O O) composed of a vocabulary with 118 or fewer elements.…”

Section: Blmm: Transformer Based 2d Materials Composition Generationmentioning

confidence: 99%

Discovery of 2D materials using Transformer Network based Generative Design

Dong¹,

Song²,

Siriwardane³

et al. 2023

Preprint

View full text Add to dashboard Cite

Two-dimensional (2D) materials have wide applications in superconductors, quantum, and topological materials. However, their rational design is not well established, and currently less than 6,000 experimentally synthesized 2D materials have been reported. Recently, deep learning, data-mining, and density functional theory (DFT)-based high-throughput calculations are widely performed to discover potential new materials for diverse applications. Here we propose a generative material design pipeline, namely material transformer generator(MTG), for large-scale discovery of hypothetical 2D materials. We train two 2D materials composition generators using self-learning neural language models based on Transformers with and without transfer learning. The models are then used to generate a large number of candidate 2D compositions, which are fed to known 2D materials templates for crystal structure prediction. Next, we performed DFT computations to study their thermodynamic stability based on energy-above-hull and formation energy. We report four new DFT-verified stable 2D materials with zero e-above-hull energies, including NiCl 4 , IrSBr, CuBr 3 , and CoBrCl. Our work thus demonstrates the potential of our MTG generative materials design pipeline in the discovery of novel 2D materials and other functional materials.

show abstract

Discovery of 2D Materials using Transformer Network‐Based Generative Design

Dong,

Song,

Siriwardane

et al. 2023

Advanced Intelligent Systems

View full text Add to dashboard Cite

Two‐dimensional (2D) materials offer great potential in various fields like superconductivity, quantum systems, and topological materials. However, designing them systematically remains challenging due to the limited pool of fewer than 100 experimentally synthesized 2D materials. Recent advancements in deep learning, data mining, and density functional theory (DFT) calculations have paved the way for exploring new 2D material candidates. Herein, a generative material design pipeline known as the material transformer generator (MTG) is proposed. MTG leverages two distinct 2D material composition generators, both trained using self‐learning neural language models rooted in transformers, with and without transfer learning. These models generate numerous potential 2D compositions, which are plugged into established templates for known 2D materials to predict their crystal structures. To ensure stability, DFT computations assess their thermodynamic stability based on energy‐above‐hull and formation energy metrics. MTG has found four new DFT‐validated stable 2D materials: NiCl4, IrSBr, CuBr3, and CoBrCl, all with zero energy‐above‐hull values that indicate thermodynamic stability. Additionally, GaBrO and NbBrCl3 are found with energy‐above‐hull values below 0.05 eV. CuBr3 and GaBrO exhibit dynamic stability, confirmed by phonon dispersion analysis. In summary, the MTG pipeline shows significant potential for discovering new 2D and functional materials.

show abstract

Crystal Transformer: Self-learning neural language model for Generative and Tinkering Design of Materials

Cited by 4 publications

References 51 publications

Material transformers: deep learning language models for generative materials design

Material transformers: deep learning language models for generative materials design

Discovery of 2D materials using Transformer Network based Generative Design

Discovery of 2D Materials using Transformer Network‐Based Generative Design

Contact Info

Product

Resources

About