2023
DOI: 10.1039/d2dd00087c
|View full text |Cite
|
Sign up to set email alerts
|

Assessment of chemistry knowledge in large language models that generate code

Abstract: In this work, we investigate the question: do code-generating large language models know chemistry? Our results indicate, mostly yes. To evaluate this, we introduce an expandable framework for evaluating chemistry...

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
33
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 43 publications
(48 citation statements)
references
References 41 publications
(60 reference statements)
0
33
0
Order By: Relevance
“…Simply put, output representations should be valid chemical structures, respecting all rules of valency and bonding, and should accurately reflect any assigned mutations or modifications from the original structure. While LLMs like GPT-3.5, GPT-4, and their chatbot adaptations, known as “ChatGPT”, offer the advantage of interpreting human instructions in a conversational format, which makes it simpler to convey abstract mutations and modifications, the early performance evaluations of these models have shown their limitations. Despite demonstrating certain levels of understanding of the underlying syntax and chemistry, these models sometimes suffer from “hallucinations” in their generated SMILES strings, which appear correct in formatting but are either chemically invalid or slightly misaligned when closely examined.…”
Section: Results and Discussionmentioning
confidence: 99%
“…Simply put, output representations should be valid chemical structures, respecting all rules of valency and bonding, and should accurately reflect any assigned mutations or modifications from the original structure. While LLMs like GPT-3.5, GPT-4, and their chatbot adaptations, known as “ChatGPT”, offer the advantage of interpreting human instructions in a conversational format, which makes it simpler to convey abstract mutations and modifications, the early performance evaluations of these models have shown their limitations. Despite demonstrating certain levels of understanding of the underlying syntax and chemistry, these models sometimes suffer from “hallucinations” in their generated SMILES strings, which appear correct in formatting but are either chemically invalid or slightly misaligned when closely examined.…”
Section: Results and Discussionmentioning
confidence: 99%
“…LLMs are nowadays applied to interpret questions in chemistry subjects and answer them to understand if LLMs can comprehend chemistry. Although researchers have recently stated that they found high accuracy on chemistry questions using some tricks, it is presented here in five tasks that the accuracy in answering the questions was between 25% and 100% without any tricks. The low or high accuracy depends on several important considerations: reasonable prompts should give correct answers, questions on popular subjects are easily answered, very specific topics that are not well included in a database or are not well trained in the model gives low accuracy, and the development of better prompts or strategies for training and fitting this knowledge in models might output better results…”
Section: Discussionmentioning
confidence: 99%
“…In this viewpoint, we attempted to mimic a regular student prompting the ChatGPT model to answer questions on chemistry subjects without using any tricks such as inserting copyright notices in source files or fine-tuning with human feedback. Although aligning language models with human intent is a promising direction to get correct answers, , nevertheless, it is important that care must be taken using completions with difficult prompts . It is also noted that LLMs always answer something.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus, instructors may choose which version to use based on student background or on the programming language used in their course or department. Writing the programs to perform basic chemical data analysis tasks like these might soon be handled using interactive large language models (LLMs) such as GPT-3, , and so the most important goal is to help students to think critically about reading, modifying, and debugging code, independent of language.…”
Section: Methodsmentioning
confidence: 99%