Abstract. In this work we propose a hierarchical state-based model for representing an echocardiogram video using objects present and their dynamic behavior. The modeling is done on the basis of the different types of views like short axis view, long axis view, apical view, etc. For view classification, an artificial neural network is trained with the histogram of a 'region of interest' of each video frame. A state transition diagram is used to represent the states of objects in different views and corresponding transition from one state to another. States are detected with the help of synthetic M-mode images. In contrast to traditional single M-mode approach, we propose a new approach named as 'Sweep M-mode' for the detection of states.