Large language models (LLMs) take advantage of step-by-step reasoning instructions, e.g., chain-of-thought (CoT) prompting. Building on this, their ability to perform CoT-style reasoning robustly is of interest from a probing perspective. In this study, we inspect the stepby-step reasoning ability of LLMs with a focus on negation, which is a core linguistic phenomenon that is difficult to process. In particular, we introduce several controlled settings (e.g., reasoning on fictional entities) to evaluate the logical reasoning abilities of the models. We observed that dozens of modern LLMs were not robust against lexical negation (e.g., plausi-ble→implausible) when performing CoT-style reasoning, and the results highlight unique limitations in each LLM family. https://github.com/muyo8692/ stepbystep-reasoning-vs-negation 14753 Setting Few-shot exemplars Target example If fails at this setting BASE Is a sentence "A does B" plausible? A is a C player. B happens in C/X. So the answer is yes/no. Is a sentence "D does E" plausible? D is a F player. E happens in F/Y. So the answer is __ CoT-style reasoning fails. FIC Is a sentence "A does B" plausible? A is a C player. B happens in C/X. So the answer is yes/no. Is a sentence "α does β" plausible? α is a γ player. β happens in γ/χ. So the answer is __ Reasoning cannot be abstracted to fictional texts. FICNEG Is a sentence "A does B" implausible? A is a C player. B happens in C/X. So the answer is yes/no.Is a sentence "α does β" implausible? α is a γ player. β happens in γ/χ.