In the context of video-based image classification, image annotation plays a vital role in improving the image classification decision based on it's semantics. Though, several methods have been introduced to adopt the image annotation such as manual and semisupervised. However, formal specification, high cost, high probability of errors and computation time remain major issues to perform image annotation. In order to overcome these issues, we propose a new image annotation technique which consists of three tiers namely frames extraction, interest point's generation, and clustering. The aim of the proposed technique is to automate the label generation of video frames. Moreover, an evaluation model to assess the effectiveness of the proposed technique is used. The promising results of the proposed technique indicate the effectiveness (77% in terms of Adjusted Random Index) of the proposed technique in the context label generation for video frames. In the end, a comparative study analysis is made between the existing techniques and proposed methodology.