“…Simply put, output representations should be valid chemical structures, respecting all rules of valency and bonding, and should accurately reflect any assigned mutations or modifications from the original structure. While LLMs like GPT-3.5, GPT-4, and their chatbot adaptations, known as “ChatGPT”, offer the advantage of interpreting human instructions in a conversational format, which makes it simpler to convey abstract mutations and modifications, the early performance evaluations of these models have shown their limitations. − Despite demonstrating certain levels of understanding of the underlying syntax and chemistry, these models sometimes suffer from “hallucinations” in their generated SMILES strings, which appear correct in formatting but are either chemically invalid or slightly misaligned when closely examined.…”