Many chemical manufacturing and separations processes like solvent extraction comprise hierarchically complex configurations of functional process units. With increasing complexity, strategies that rely on heuristics become less reliable for design optimization. In this study, we explore deep reinforcement learning for mapping the space of feasible designs to find an optimization strategy that can match or exceed the performance of conventional optimization. To this end, we implement a highly configurable learning environment for the solvent design process to which we can couple state‐of‐the‐art deep reinforcement learning agents. We evaluate the trained agents against the heuristic optimization for the solvent process design tasked to optimize recovery efficiency and product purity. Results demonstrated the agent successfully learned the strategy for predicting comparably optimal solvent extraction process designs for varying combinations of feed compositions.