In this paper, we propose a technique to recognize multiple actions in a video using deep learning. The proposed approach is concerned with interpreting the overall context of a video and transforming it into one or more appropriate actions. In order to cope with multiple actions in a video, our proposed technique first determines the individual segments/shots in a video using intersections of color histograms. The segmented parts are then fed to the action recognition system comprising a combination of a Convolution Neural Network (CNN) and a Long-Short-Term Memory (LSTM) network trained on our action vocabulary. The segments are then labeled according to their predicted actions and a compact set of distinct actions is produced. Using the corpus generated by the shot detection phase, which includes the location of key frames in shots, and start/end timestamps of a shot, we can also perform video segmentation based on an action query. Hence, the proposed technique can be used for a number of tasks such as content censoring, on-demand scene retrieval, video summarizing, and query based scene/video retrieval, to name a few. The proposed technique also stands aprat from the existing approaches which either do not take into account the motion information for action prediction or do not perform action-based video segmentation. The experimental results presented in this paper show that the proposed technique not only finds the the complete set of actions present in the video, but can also find all the relevant parts in a video based on an action query.