Effective encoding and indexing of audiovisual documents are two key aspects for enhancing the multimedia user experience. In this paper we propose the embedding of low-level content descriptors into a scalable video coding bit-stream by jointly optimizing encoding and indexing performance. This approach provides a new type of bit-stream where part of the information is used for both content encoding and content description, allowing the so called ”Midstream Content Access”. To support this concept, a novel technique based on the appropriate combination of Vector Quantization and Scalable Video Coding has been developed and evaluated. More specifically, the key-pictures of each video GOP are encoded at a first draft level by using an optimal visual-codebook, while the residual errors are encoded using a conventional approach. The same visual-codebook is also used to encode all the key pictures of a video shot, which boundaries are dynamically estimated. In this way, the visual-codebook is freely available as an efficient visual descriptor of the considered video shot. Moreover, since a new visual-codebook is introduced every time a new shot is detected, also an implicit temporal segmentation is provided