International audienceVisual attention is one of the most important mechanisms in the human visual perception. Recently, its modeling becomes a principal requirement for the optimization of the image processing systems. Numerous algorithms have already been designed for 2D saliency prediction. However, only few works can be found for 3D content. In this study, we propose a saliency model for stereoscopic 3D video. This algorithm extracts information from three dimensions of content, i.e. spatial, temporal and depth. This model benefits from the properties of interest points to be close to human fixations in order to build spatial salient features. Besides, as the perception of depth relies strongly on monocular cues, our model extracts the depth salient features using the pictorial depth sources. Since weights for fusion strategy are often selected in ad-hoc manner, in this work, we suggest to use a machine learning approach. The used artificial Neural Network allows to define adaptive weights based on the eye-tracking data. The results of the proposed algorithm are tested versus ground-truth information using the state-of-the-art techniques