Drivers of reinforcement learning (RL), beyond reward, are controversially debated. Novelty and surprise are often used equivocally in this debate. Here, using a deep sequential decision-making paradigm, we show that reward, novelty, and surprise play different roles in human RL. Surprise controls the rate of learning, whereas novelty and the novelty prediction error (NPE) drive exploration. Exploitation is dominated by model-free (habitual) action choices. A theory that takes these separate effects into account predicts on average 73 percent of the action choices of human participants after the first encounter of a reward and allows us to dissociate surprise and novelty in the EEG signal. While the event-related potential (ERP) at around 300ms is positively correlated with surprise, novelty, NPE, reward, and the reward prediction error, the ERP response to novelty and NPE starts earlier than that to surprise.