A Measurement Study of the Linux TCP/IP Stack Performance and Scalability on SMP systems

Bhattacharya, Sandip; Apte, Varsha

doi:10.1109/comswa.2006.1665153

Cited by 14 publications

(5 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Data destined for a thread running on CPU0 may be received by the kernel on CPU1 causing cache related slowdowns. Therefore, the dilation in TCP processing times seen in 64x2 run is very likely cache related ( [24] also found evidence of TCP/IP cache problems on SMP).…”

Section: Chiba Experimentsmentioning

confidence: 90%

Integrated parallel performance views

et al. 2007

View full text Add to dashboard Cite

The influences of the operating system and system-specific effects on application performance are increasingly important considerations in high performance computing. OS kernel measurement is key to understanding the performance influences and the interrelationship of system and user-level performance factors. The KTAU (Kernel TAU) methodology and Linux-based framework provides parallel kernel performance measurement from both a kernelwide and process-centric perspective. The first characterizes overall aggregate kernel performance for the entire system. The second characterizes kernel performance when it runs in the context of a particular process. KTAU extends the TAU performance system with kernel-level monitoring, while leveraging TAU's measurement and analysis capabilities. We explain the rational and motivations behind our approach, describe the KTAU design and implementation, and show working examples on multiple platforms demonstrating the versatility of KTAU in integrated system/application monitoring.

show abstract

Section: Chiba Experimentsmentioning

confidence: 90%

Integrated parallel performance views

et al. 2007

View full text Add to dashboard Cite

show abstract

“…iperf is a TCP/UDP-based network bandwidth measurement application that prints network bandwidth of the system. The major network overhead is interrupt processing cost, device driver overhead, checksumming, and buffer copying (the overhead of buffer copying is about 23%) [11]. Exploiting DMA reduces CPU overhead for memory copy, but it still suffers from throughput limitation originating from CPU-side datapath [50].…”

Section: Experimental Methodologymentioning

confidence: 99%

3D-Xpath

Lee

Sung

et al. 2018

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

View full text Add to dashboard Cite

The advance of DRAM manufacturing technology slows down, whereas the density and performance needs of DRAM continue to increase. This desire has motivated the industry to explore emerging Non-Volatile Memory (e.g., 3D XPoint) and the high-density DRAM (e.g., Managed DRAM Solution). Since such memory technologies increase the density at the cost of longer latency, lower bandwidth, or both, it is essential to use them with fast memory (e.g., conventional DRAM) to which hot pages are transferred at runtime. Nonetheless, we observe that page transfers to fast memory often block memory channels from servicing memory requests from applications for a long period. This in turn significantly increases the high-percentile response time of latency-sensitive applications. In this paper, we propose a high-density managed DRAM architecture, dubbed 3D-XPath for applications demanding both low latency and high capacity for memory. 3D-XPath DRAM stacks conventional DRAM dies with high-density DRAM dies explored in this paper and connects these DRAM dies with 3D-XPath. Especially, 3D-XPath allows unused memory channels to service memory requests from applications when primary channels supposed to handle the memory requests are blocked by page transfers at given moments, considerably increasing the high-percentile response time. This can also improve the throughput of applications frequently copying memory blocks between kernel and user memory spaces. Our evaluation shows that 3D-XPath DRAM decreases high-percentile

show abstract

“…Data destined for a thread running on CPU0 may be received by the kernel on CPU1 causing cache related slowdowns. Therefore, the dilation in TCP processing times seen in 64x2 run is very likely cache related ( [19] also found TCP/IP cache problems on SMP).…”

Section: Chiba Experimentsmentioning

confidence: 95%