In the field of electric power, knowledge graph provides technical support for the construction of smart grid, and relationship extraction is the key link of knowledge graph, the knowledge contained in the news data of electric power equipment has a very high application value, but for the multimodal information source, the effectiveness of the general relationship extraction model will be drastically reduced when its text is very much lacking in the conditions of entity, for this reason, this study in the combination of graphic and textual multimodal entity relationship extraction task, a model based on multimodal semantic fusion after image description generation is proposed, which not only takes into account the entity information of the dual information sources of image and text, but also extracts the inter-entity relations. By looking at the results from the comparative experiments, it can be seen that the proposed model has better performance and the highest accuracy compared to other similar models.