Reinforcement Learning (RL) is a semi-supervised learning paradigm where an agent learns by interacting with an environment. Deep learning in combination with RL provides an efficient method to learn how to interact with the environment called Deep Reinforcement Learning (deep RL). Deep RL has gained tremendous success in gaming -such as AlphaGo, but its potential has rarely being explored for challenging tasks like Speech Emotion Recognition (SER). Deep RL being used for SER can potentially improve the performance of an automated call centre agent by dynamically learning emotion-aware responses to customer queries. While the policy employed by the RL agent plays a major role in action selection, there is no current RL policy tailored for SER. In addition, an extended learning period is a general challenge for deep RL, which can impact the speed of learning for SER. Therefore, in this paper, we introduce a novel policy -the "Zeta policy" which is tailored for SER and apply pre-training in deep RL to achieve a faster learning rate. Pre-training with a cross dataset was also studied to discover the feasibility of pre-training the RL agent with a similar dataset in a scenario where real environmental data is not available. The IEMOCAP and SAVEE datasets were used for the evaluation with the problem being to recognise the four emotions happy, sad, angry, and neutral in the utterances provided. The experimental results show that the proposed "Zeta policy" performs better than existing policies. They also support that pre-training can reduce the training time and is robust to a cross-corpus scenario.