Deep reinforcement learning (DRL) has advanced robot manipulations with an alternative solution to design a control strategy using the raw image as the input directly. Although the image usually comes up with more knowledge about the environment, it needs the policy to achieve representation learning and task learning simultaneously, which is a sample inefficient task. Previous attempts, such as Variational Autoencoder (VAE) based DRL algorithms have attempted to solve this problem by learning a visual representation model, which encodes the entire image into a low-dimension vector. However, since the vector contains both the robot and object information, the coupling within the state is inevitable, which could mislead the training process of DRL policy. In this study, a novel method named Reinforcement Learning with Decoupled State Representation (RLDS) is proposed to decouple the robot and object information to increase the learning efficiency and effectiveness. The experimental results have shown that the proposed method has a faster learning speed and can achieve better performance compared with previous methods in several typical robot tasks. Additionally, with only 3,096 offline images, the proposed method can be successfully applied to a real robot pushing task, which demonstrates its high practicability.