Sundar Iyer scite author profile

Sundar Iyer

4Publications

268Citation Statements Received

41Citation Statements Given

How they've been cited

322

267

How they cite others

Affiliations

Hewlett-Packard (United States), Stanford University, Cisco Systems (United States)

Publications

Order By: Most citations

Analysis of the parallel packet switch architecture

Iyer

McKeown

2003

IEEE/ACM Trans. Networking

View full text Add to dashboard Cite

Abstract-Our work is motivated by the desire to design packet switches with large aggregate capacity and fast line rates. In this paper, we consider building a packet switch from multiple lower speed packet switches operating independently and in parallel. In particular, we consider a (perhaps obvious) parallel packet switch (PPS) architecture in which arriving traffic is demultiplexed over identical lower speed packet switches, switched to the correct output port, then recombined (multiplexed) before departing from the system. Essentially, the packet switch performs packet-by-packet load balancing, or inverse multiplexing, over multiple independent packet switches. Each lower speed packet switch operates at a fraction of the line rate . For example, each packet switch can operate at rate . It is a goal of our work that all memory buffers in the PPS run slower than the line rate. Ideally, a PPS would share the benefits of an output-queued switch, i.e., the delay of individual packets could be precisely controlled, allowing the provision of guaranteed qualities of service.In this paper, we ask the question: Is it possible for a PPS to precisely emulate the behavior of an output-queued packet switch with the same capacity and with the same number of ports? We show that it is theoretically possible for a PPS to emulate a first-come first-served (FCFS) output-queued (OQ) packet switch if each lower speed packet switch operates at a rate of approximately 2. We further show that it is theoretically possible for a PPS to emulate a wide variety of quality-of-service queueing disciplines if each lower speed packet switch operates at a rate of approximately 3 . It turns out that these results are impractical because of high communication complexity, but a practical high-performance PPS can be designed if we slightly relax our original goal and allow a small fixed-size coordination buffer running at the line rate in both the demultiplexer and the multiplexer. We determine the size of this buffer and show that it can eliminate the need for a centralized scheduling algorithm, allowing a full distributed implementation with low computational and communication complexity. Furthermore, we show that if the lower speed packet switch operates at a rate of (i.e., without speedup), the resulting PPS can emulate an FCFS-OQ switch within a delay bound.Index Terms-Clos network, inverse multiplexing, load balancing, output queueing, packet switch.

show abstract

2008

IEEE/ACM Trans. Networking

View full text Add to dashboard Cite

--All routers contain buffers to hold packets during times of congestion. When designing a high-capacity router (or linecard) it is challenging to design buffers because of the buffer's speed and size, both of which grow linearly with line-rate, . With today's DRAM technology, it is barely possible to design buffers for a 40Gb/s linecard in which packets are written to (read from) memory at the rate at which they arrive (depart). Over time, the problem will get harder: Link rates will increase, line cards will connect to more lines, and buffers will get larger. Ideally, we would like a memory with the density of DRAM, and the speed of SRAM. And so some commercial routers sometimes use hybrid packet buffers built from a combination of small fast SRAM and large slow DRAM. The SRAM holds ("caches") the heads and tails of packet FIFOs, allowing arriving packets to be written quickly to the tail, and departing packets to be read quickly from the head. The large DRAMs are used for bulk storage, to hold the majority of packets in each FIFO that are neither at the head nor the tail. Because of the relatively long time to write to (or read from) the DRAMs, data is transferred between SRAM and DRAM in large fixedsize blocks, consisting of perhaps many packets at a time. A memory manager shuttles packets between the SRAM cache and the DRAM with two goals: (1) Arriving packets are written to DRAM before the SRAM overflows, and (2) Departing packets are guaranteed to be in the SRAM when it's their turn to leave. In this paper we find optimal memory managers that achieve both goals, while minimizing the size of the SRAM cache. When the delay through the buffer is minimized, the size of the SRAM cache is proportional to , where is the number of FIFOs that the buffer maintains. There is a tradeoff between the size of the SRAM and the minimum pipeline delay through the packet buffer. When a pipeline delay can be tolerated, we find memory managers that reduce the required SRAM cache size so as to be proportional to .

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sundar Iyer

Analysis of the parallel packet switch architecture

Analysis of a memory architecture for fast packet buffers

Making parallel packet switches practical

Designing Packet Buffers for Router Linecards

Contact Info

Product

Resources

About