In the field of Generalized Zero-Shot Learning (GZSL), the challenge lies in learning attribute-based information from seen classes and effectively conveying this knowledge to recognize both seen and unseen categories during the training process. This paper proposes an innovative approach to enhance the generalization ability and efficiency of GZSL models by integrating a Convolutional Block Attention Module (CBAM). The CBAM blends channel-wise and spatial-wise information to emphasize key features, thereby improving the model’s discriminative and localization capabilities. Additionally, the method employs a ResNet101 backbone for systematic image feature extraction, enhanced contrastive learning, and a similarity map generator with attribute prototypes. This comprehensive framework aims to achieve robust visual–semantic embedding for classification tasks. The proposed method demonstrates significant improvements in performance metrics in benchmark datasets, showcasing its potential in advancing GZSL applications.