Nowadays computer technologies are flowering especially the artificial intelligence field. It lives its prosperous years. Recently it closes the gap between humans and machines with the facilitation of supporting decisions. One of these gaps is the surveillance cameras labors' attentiveness and the lack of instantaneous detection of violence actions on the scenes of such cameras. In this paper we present an end to end deep neural network to detect the violence scenes in the surveillance cameras, the proposed system composed of set of phases. It extracts a set of selectively distributed frames of the video clip, performs spatio-temporal features, and passes them to a fully connected neural to classify the video to violence or non-violence action. The model is evaluated on different datasets; like Real Life Violence Situations aka RLVS and Hockey Fight Detection datasets. The accuracy was 92% and 94.5% respectively, which outperformed the previous related works.