In the field of Chinese higher education, gender is still a significant issue, as is a general ignorance of gender discrimination against women. Issues related to gender can be observed throughout the process of education: at the time of entering an institution, during the educational process and as an outcome of education. The following seven aspects of sexual discrimination occur in Chinese higher education system: (1) Fewer opportunities for women in higher education than for men; (2) within disciplines and specializations there exists the phenomena of gender segregation and diffluence; (3) considerable gender difference exists in the distribution of school resources; (4) teaching materials and teaching content are gender discriminatory; (5) within higher education institutions, student organizations have a degree of gender imbalance; (6) campus culture has a hidden agenda of gender discrimination; and (7) employment prospects for women tend to be unequal and discriminatory.
Self-attention mechanism, which has been successfully applied to current encoder-decoder framework of image captioning, is used to enhance the feature representation in the image encoder and capture the most relevant information for the language decoder. However, most existing methods will assign attention weights to all candidate vectors, which implicitly hypothesizes that all vectors are relevant. Moreover, current self-attention mechanisms ignore the intra-object attention distribution, and only consider the inter-object relationships. In this paper, we propose a Multi-Gate Attention (MGA) block, which expands the traditional self-attention by equipping with additional Attention Weight Gate (AWG) module and Self-Gated (SG) module. The former constrains the attention weights to be assigned to the most contributive objects. The latter is adopted to consider the intra-object attention distribution and eliminate the irrelevant information in object feature vector. Furthermore, most current image captioning methods apply the original transformer designed for natural language processing task, to refine image features directly. Therefore, we propose a pre-layernorm transformer to simplify the transformer architecture and make it more efficient for image feature enhancement. By integrating MGA block with pre-layernorm transformer architecture into the image encoder and AWG module into the language decoder, we present a novel Multi-Gate Attention Network (MGAN). The experiments on MS COCO dataset indicate that the MGAN outperforms most of the state-of-the-art, and further experiments on other methods combined with MGA blocks demonstrate the generalizability of our proposal. INDEX TERMS Image captioning, self-attention, transformer, multi-gate attention.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.