Gene Expression Microarray (GEM) data is biological data that contains valuable hidden information genes. The gene information extracted from variations of gene expression levels is utilized for disease detection and diagnosis, especially in cancer classification. Since GEM data contains a relatively large sample size with highly redundant and imbalanced data, the accuracy of the cancer classification result is lower. It is difficult to identify suitable features from large GEM datasets. Hence, in this paper, this model utilizes Grey Wolf Optimization (GWO) Model to select the features from the GEM data. Convolutional Neural Network with Long Short Time Memory (ConvLSTM) is developed by utilizing Deep Reinforcement Learning (DRL) to select the appropriate features and parameters for efficient cancer classification. The ConvLSTM model is used to convert low-level features into high-level ones by identifying distributed data representations. DRL optimizes ConvLSTM parameters iteratively which significantly impacts the overall learning process of this prediction model. In DRL, The Double Deep Q-Network (DDQN) model is introduced to minimize training-time overestimations of action values. Finally, the loss function is employed in the Neural Network (NN) of ConvLSTM for accurate cancer detection and diagnosis of cancer. The proposed model is termed Improved ConvLSTM using DDQN (ICL-DDQN).The ICL-DDQN-DDQN achieves accuracy of 92%, 91.67% and 92.22% for breast cancer, leukemia and lung cancer datasets which is 32.69%,57.16%, 23.89% higher than 1D-CNN; 21.06%, 43.18%, 16.89% higher than DL-DCGN; 15%, 28.33%, 10.18% higher than DL-SAE and 6.15%, 132.79%, 4.83% higher than DL-AAA on respective datasets. The proposed model effectively detects cancer at its earlier stage, reducing manual inspection and time for doctors and physicians, resulting in more effective treatment.