In the age of internet, the demand of visually impaired groups to perceive graphic images through tactile sense is becoming stronger and stronger. Image object recognition is a basic task in the field of computer vision. In recent years, deep neural networks have promoted the development of image object recognition. However, existing methods generally have problems of image details’ loss and edge refinement, which cannot improve the accuracy rate of object recognition for visually impaired groups. In order to solve this problem, this study proposes a graphic perception system, which improves the attention mechanism. This system mainly consists of three modules: mixing attention module (MAM), enhanced receptive field module (ERFM), and multilevel fusion module (MLAM). MAM can generate better semantic features, which can be used to guide feature fusion in the decoding process, so that the aggregated features can better locate significant objects. ERFM can enrich the context information of low-level features and input the enhanced features into MLAM. MLAM uses the semantic information generated by MAM to guide the fusion of the current decoded features and the low-level features’ output by ERFM, and gradually recover boundary details in a cascading manner. Finally, the proposed algorithm is compared with other algorithms on PASCAL VOC and MS-COCO data. Experimental results show that the proposed method can effectively improve the accuracy of graphic object recognition.