Automatic speech recognition (ASR) transcribes the human voice into a text automatically. Recently, ASR systems has reached, almost, the human performance in specific scenarios. In contrast, dysarthric speech recognition (DSR) is still a challenging task due to many reasons including unintelligible speech, irregular phonemes articulation, along with scarcity and heterogeneous of data. Most of the existing DSR works are employed the ASR systems that trained on an unimpaired speech to recognize such impaired speech, which of course is impractical and inefficient. In this paper, we developed a deep architecture of the convolutional recurrent neural network (CRNN) model and compared its performance with the vanilla convolutional neural network (CNN) model. We train both models using the samples of the Torgo dataset, which contains a mixed of impaired and unimpaired speech data. The experimental results show that the CRNN model attains 40.6% against 31.4% for the vanilla CNN. This indicates the effectiveness of the proposed hybrid structure of the CRNN to improve the recognition of dysarthric speech.