In this work, we aim to develop novel cybersecurity playbooks by exploiting dynamic reinforcement learning (RL) methods to close holes in the attack surface left open by the traditional signature-based approach to Defensive Cyber Operations (DCO). A useful first proof-of-concept is provided by the problem of training a scanning defense agent using RL; as a first line of defense, it is important to protect sensitive networks from network mapping tools. To address this challenge, we developed a hierarchical, Monte Carlo-based RL framework for the training of an autonomous agent which detects and reports the presence of Nmap scans in near real-time, efficiently and with near-perfect accuracy. Our algorithm is powered by a reduction of the state space given by a transformer, CLAPBAC, an anomaly detection tool which applies natural language processing to cybersecurity in a manner consistent with state-of-the-art. In a realistic scenario emulated in CyberVAN, our approach generates optimized playbooks for effective defense against malicious insiders inappropriately probing sensitive networks.