This research paper conducts a comparative analysis of two convolutional neural network (CNN) architectures to examine their performance in recognizing gestures using the ArSL2018 dataset, a significant resource comprising 54,049 images across 32 classes representing Arabic Alphabet Sign Language (ArASL). Our goal is to determine the most effective technological application for facilitating communication within the Arabic-speaking deaf community, thereby enhancing their interaction with digital platforms and everyday technology interfaces. The first architecture employs a pre-trained MobileNetV2 model as a feature extractor followed by a fully connected layer, while the second architecture builds upon the MobileNetV2 by incorporating additional convolutional and pooling layers. Through rigorous evaluation using multiple metrics including accuracy, precision, recall, and F1-score, we discovered that the first architecture achieved a higher overall accuracy of 95% on the test set compared to 93.85% for the second, with per-class accuracies ranging from 82.91–99.10%. These findings suggest that simpler CNN architectures with pre-trained feature extractors are not only effective but also potentially more efficient for integrating into assistive technologies. This study underscores the potential of gesture recognition systems to improve the quality of life for the deaf and hard-of-hearing by providing more natural, intuitive ways to interact with technology. By focusing on user-centric design and ethical AI deployment, our findings contribute to the broader discourse on developing responsible, inclusive technologies that uphold human dignity and foster social inclusion.