Assessment of chemistry knowledge in large language models that generate code

White, Ann Marie; Hocky, Glen M.; Ansari, Mehrad; Gandhi, Heta A.; Cox, Sam; Wellawatte, Geemi P.; Sasmal, Subarna; Yang, Zeliang; Liu, Kangxin; Singh, Yuvraj; Ccoa, Willmor Peña

doi:10.1039/d2dd00087c

Cited by 43 publications

(48 citation statements)

References 41 publications

(60 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Simply put, output representations should be valid chemical structures, respecting all rules of valency and bonding, and should accurately reflect any assigned mutations or modifications from the original structure. While LLMs like GPT-3.5, GPT-4, and their chatbot adaptations, known as “ChatGPT”, offer the advantage of interpreting human instructions in a conversational format, which makes it simpler to convey abstract mutations and modifications, the early performance evaluations of these models have shown their limitations. − Despite demonstrating certain levels of understanding of the underlying syntax and chemistry, these models sometimes suffer from “hallucinations” in their generated SMILES strings, which appear correct in formatting but are either chemically invalid or slightly misaligned when closely examined.…”

Section: Results and Discussionmentioning

confidence: 99%

Shaping the Water-Harvesting Behavior of Metal–Organic Frameworks Aided by Fine-Tuned GPT Models

Zheng,

Alawadhi,

Chheda

et al. 2023

J. Am. Chem. Soc.

View full text Add to dashboard Cite

Section: Results and Discussionmentioning

confidence: 99%

Shaping the Water-Harvesting Behavior of Metal–Organic Frameworks Aided by Fine-Tuned GPT Models

Zheng,

Alawadhi,

Chheda

et al. 2023

J. Am. Chem. Soc.

View full text Add to dashboard Cite

“…LLMs are nowadays applied to interpret questions in chemistry subjects and answer them to understand if LLMs can comprehend chemistry. Although researchers have recently stated that they found high accuracy on chemistry questions using some tricks, it is presented here in five tasks that the accuracy in answering the questions was between 25% and 100% without any tricks. The low or high accuracy depends on several important considerations: reasonable prompts should give correct answers, questions on popular subjects are easily answered, very specific topics that are not well included in a database or are not well trained in the model gives low accuracy, and the development of better prompts or strategies for training and fitting this knowledge in models might output better results…”

Section: Discussionmentioning

confidence: 99%

“…In this viewpoint, we attempted to mimic a regular student prompting the ChatGPT model to answer questions on chemistry subjects without using any tricks such as inserting copyright notices in source files or fine-tuning with human feedback. Although aligning language models with human intent is a promising direction to get correct answers, , nevertheless, it is important that care must be taken using completions with difficult prompts . It is also noted that LLMs always answer something.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Do Large Language Models Understand Chemistry? A Conversation with ChatGPT

Nascimento

Pimentel

2023

J. Chem. Inf. Model.

View full text Add to dashboard Cite

show abstract

“…Thus, instructors may choose which version to use based on student background or on the programming language used in their course or department. Writing the programs to perform basic chemical data analysis tasks like these might soon be handled using interactive large language models (LLMs) such as GPT-3, , and so the most important goal is to help students to think critically about reading, modifying, and debugging code, independent of language.…”

Section: Methodsmentioning

confidence: 99%

Rediscovering the Particle-in-a-Box: Machine Learning Regression Analysis for Hypothesis Generation in Physical Chemistry Lab

Thrall,

Martinez Lopez,

Egg

et al. 2023

J. Chem. Educ.

View full text Add to dashboard Cite

Assessment of chemistry knowledge in large language models that generate code

Abstract: In this work, we investigate the question: do code-generating large language models know chemistry? Our results indicate, mostly yes. To evaluate this, we introduce an expandable framework for evaluating chemistry...

Cited by 43 publications

References 41 publications

Shaping the Water-Harvesting Behavior of Metal–Organic Frameworks Aided by Fine-Tuned GPT Models

Shaping the Water-Harvesting Behavior of Metal–Organic Frameworks Aided by Fine-Tuned GPT Models

Do Large Language Models Understand Chemistry? A Conversation with ChatGPT

Rediscovering the Particle-in-a-Box: Machine Learning Regression Analysis for Hypothesis Generation in Physical Chemistry Lab

Contact Info

Product

Resources

About