Multithreading and multi-core processing have been shown to be powerful approaches for boosting a system performance by taking advantage of parallelism in applications. This paper presents a processor design by unifying RISC and multithreading DSP for the sophisticated multimedia applications with advanced standards such as H.264. The proposed design not only minimizes integration costs for embedded multithreading/multi-core design by independent coherent threads, but also reduces the memory bandwidth requirements by one-stop streaming buffer and a very fast data exchange mechanism. With the proposed techniques and appropriate programming model, we can achieve 78% reduction of memory bandwidth and 89% reduction of processing time in H.264 video encoding, compared to traditional single stream micro-processor.