In this paper, the infrared small target detection in video sequence is investigated. A collaborative structured sparse coding model which incorporates the L 1,2 and L 2,1 regularization terms is proposed to detect the infrared small target in video sequence. Further, online dictionary learning is embedded into the model and temporal information is incorporated to eliminate the clutters and noises. Finally, four simulation datasets are constructed to test the proposed method and the experimental validation shows promising results. ∈ M . We use superscripts for the rows of M , i.e., ( ) i M denotes the i-th row; and subscripts for the columns of M , i.e., ( ) j M denotes the j-th column. We will use various matrix norms, here are the notations we use: F M is the Frobenious norm, which is also equal to ( ) T Tr M M ; 2,1 M is the sum of the L 2 norm of the rows of