In this paper, we explore real time N-way video communication over enterprise networks based on scalable video coding, SVC (Scalable extension of H.264/AVC). We present a bit stream extraction strategy based on GOP size prediction and perceptual importance of temporal and quality enhancements, and validate the strategy through initial subjective testing. We also compare it with extraction based on average bit rate of each constituent layer of the bitstream. Our extraction is adaptive with respect to variations in available bandwidth and is proxy driven, in that the decision process and the adaptation are preformed at proxy servers located at the edges of the backbone network. The main goals are to 1)maximize the user perceived quality even during deteriorating channel conditions, 2) maximize the reaction speed to changes in available bandwidth and 3) minimize the extraction delay. We report objective and subjective results for the N=2 case, with HD video clips encoded at 600 -900 kbps. With a channel utilization of 92.7%, our extraction algorithm based on GOP size prediction shows an average increase in PSNR of about 2.2 dB over the extraction based on average bit rate. Initial subjective tests also prove that our layer extraction strategy is perceptually more efficient than other extraction schemes.