Current works in Visual Question Answering(VQA) that introduce external knowledge mainly focus on leveraging the knowledge to supplement the language representation in VQA model’s question input. However, such approaches ignore the commonsense information implied in the image. In this paper, we propose a novel VQA framework to embed knowledge features into vision and language respectively with a shared knowledge graph. To bridge the gap between visual representation and knowledge representation, we propose the knowledge enhancing visual representation (KEVR) module. KEVR is designed to explore external knowledge in image from the knowledge graph. By utilizing KEVR, external knowledge related to the objects in the image can be embedded into visual representation directly. In terms of the input question, a designed transformer is utilized to embed knowledge features into language representation. The knowledge graph used in our model is extracted from three knowledge bases. We organize the prior knowledge in the form of RDF triples to establish knowledge connections, then a graph neural network is employed to extract multilateral relationships in knowledge graph. What's more, a two-stream transformer is employed to get the attention based vision-language representation. The results of our experiments show that our model outperforms the best baseline model with an accuracy of 1.34% and 2.59% on VQA 2.0 and OK-VQA datasets respectively.