Streaming is emerging as an important programming model for multicores. Streaming provides an elegant way to express task decomposition and inter-task communication, while hiding laborious orchestration details such as load balancing, assignment (of stream computation to nodes) and computation/communication scheduling from the programmer. This paper develops a novel communication optimization for streaming applications based on the observation that streaming computations typically involve large, systematic data transfers between known communicating pairs of nodes over extended periods of time. From the above observation, we advocate a family of routing algorithms that expend some over overheads to compute disjoint paths for stream communication. Disjoint-path routing is an attractive design point because (a) the overheads of discovering disjoint paths are amortized over large periods of time and (b) the benefits of disjoint path routing are significant for bandwidth-sensitive streaming applications. We develop one instance of disjoint-path routing called tentacle routing -a backtracking, besteffort technique. On a 4x4 (6x6) system, tentacle routing results in 55% (84%) and 28% (41%) mean throughput improvement for high-network-contention streaming applications, and for all streaming applications, respectively.To motivate our design, we begin by describing the two key inefficiencies of coherence networks in supporting streaming communication. The first inefficiency is the mismatch between "pull-based" communication supported by coherence networks and the "push-based" communication used by streaming applications. To understand this mismatch, consider the least disruptive way to support streaming on coherence networks of shared memory multicores. In such a design, streaming would be implemented using the underlying shared memory (i.e., by implementing streams as softwarebased FIFOs) and no hardware changes would be needed. Coherent shared memory supports pull based communication where the producing actor deposits new values in its local caches, which are then "pulled" by coherence mechanisms by consuming actors. In contrast, stream communication is inherently push-based wherein producing actors actively push data to consuming actors. Supporting push-based communication using pull-based mechanisms wastes bandwidth because pull-based communication requires a request and a response, rather than a simple data packet. In addition, because pull-based communication requires producerconsumer synchronization, there may be other packets to be exchanged as well. Traffic : S {C1, C2, C3, C4} {C1, C2, C3, C4} J Traffic : S {C1, C2, C3, C4} {C1, C2, C3, C4} J S J C1 C2 C3 C4 S J C1 C2 C3 C4 A) Original Graph & 4x4 MESH Mapping C1 C2 C3 C4 J S Mapping Contention C1 C4 C2 C3 S J C) Disjoint Path Routing B) DOR