The extraction of semantic information from digital video is important to be used on personalization services because the content is adapted according to each user's preferences. However, although it is possible to find several approaches in the literature, automatic indexing techniques are able to generate semantic metadata only when the content's domain is restricted. Alternatively, this information can be created manually by professionals, but this activity is time-consuming and error-prone. A possible solution would be to explore collaborative users' annotations, but such approach has the disadvantage of lacking the individuality of annotations, hampering the extraction of user's preferences from the interaction. This work has the objective of proposing a generic personalization architecture that allows multimedia indexing procedures to be accomplished in a cheap and unrestricted way. Such architecture uses collaborative annotations, but keeps the individuality of the data in order to augment the user's profile with relevant concepts. The multimodality of metadata and user's preferences is also explored in this work, which provides robustness during the extraction of semantic information, bringing benefits to applications. This work also presents two personalization services that explore the proposed architecture, along with evaluations that compare the obtained results with previously proposed approaches.