Sea-surface targets are automatically detected and tracked using the bag-of-features (BOF) technique with the scale-invariant feature transform (SIFT) in infrared (IR) and visual (VIS) band videos. Features corresponding to the sea-surface targets and background are first clustered using a training set offline, and these features are then used for online target detection using the BOF technique. The features corresponding to the targets are matched to those in the subsequent frame for target tracking purposes with a set of heuristic rules. Tracking performance is compared with an optical-flow-based method with respect to the ground truth target positions for different real IR and VIS band videos and synthetic IR videos. Scenarios are composed of videos recorded/generated at different times of day, containing single and multiple targets located at different ranges and orientations. The experimental results show that sea-surface targets can be detected and tracked with plausible accuracies by using the BOF technique with the SIFT in both IR and VIS band videos.