A fine-grained video content indexing, retrieval, and adaptation requires accurate metadata describing its structure and semantics to the lowest granularity, i.e., the object level. We address these requirements by proposing Semantic Video Content Annotation Tool (SVCAT) for structural and high-level semantic annotation. SVCAT is a semi-automatic MPEG-7 standard compliant annotation tool, which produces metadata according to a new object-based video content model. Videos are temporally segmented into shots and shots level concepts are detected automatically using ImageNet as a background knowledge. These concepts are used as a guide to easily locate and select objects of interest which can be tracked automatically. The integration of shot based concept detection with object localization and tracking drastically alleviates the task of an annotator. As such, SVCAT enables to easily generate selective and fine-grained metadata which are vital for user centric object level semantic video operations such as product placement or obscene material removal. Experimental results show that SV-CAT is able to provide accurate object level video metadata.