Abstract-One of the most widely used architectures for packet switches is the crossbar. A special version of a it is the buffered crossbar, where small buffers are associated with the crosspoints. The advantages of this organization, when compared to the unbuffered architecture, is that it needs much simpler and slower scheduling circuits, while it can shape the switched traffic according to a given set of Quality of Service (QoS) criteria in a more efficient way. Furthermore, by supporting variable length packets throughout a buffered crossbar: a) there is no need for segmentation and reassembly circuits, b) no internal speedup is necessary, and c) synchronization between the input and output clock domains is simplified. In this paper we present an architecture, a hardware implementation analysis, and a performance evaluation of such a buffered crossbar. The proposed organization is simple, yet powerful and can be easily implemented using today's technologies. Our evaluation shows that it outperforms most of the existing packet switch architectures, while its hardware cost is kept to a minimum.
Abstract-Buffered crossbars can directly switch variable size packets, but require large crosspoint buffers to do so, especially when jumbo frames are to be supported. When this is not feasible, segmentation and reassembly (SAR) must be used. We propose a novel SAR scheme for buffered crossbars that uses variable-size segments while merging multiple packets (or fragments thereof) into each segment. This scheme eliminates padding overhead, reduces header overhead, reduces crosspoint buffer size and is suitable for use with external, modern DRAM buffer memory in the ingress line cards. We evaluate the new scheme using simulation, and show that it outperforms existing segmentation schemes in buffered as well as unbuffered crossbars. We also study how the size of the maximum segment affects system performance.
Abstract-Buffered crossbars can directly switch variable size packets but they require significant crosspoint buffering to do so, especially when the traffic includes large packets. When we cannot afford large crosspoint buffers we are forced to restrict the maximum internal transfer unit by segmenting packets. Packet segmentation implies a reassembly delay cost which is an issue in systems requiring low latency. We drastically reduce reassembly delay by applying packet mode scheduling to the buffered crossbar architecture. Packet mode scheduling has been studied in input queued switches: when the central switch scheduler establishes a connection from a switch input to a switch output port, it maintains that connection until all the cells of the packet are switched. In buffered crossbars the scheduling is distributed at switch input and output ports, thus the extension is not trivial. We synchronize the input and output port schedulers so as whenever their independent decisions result to an input-output port pairing they maintain that pairing for the lifetime of the packet transmission. Using simulation we study our system's performance. We show that reassembly delay is significantly reduced, especially under light loads.
Abstract-It is widely believed that bufferless crossbar switches with virtual-output queues (VOQ) at their inputs can only operate when their input-output connections are reconfigured in synchrony, i.e. only under fixed-size cell traffic. Packet-mode scheduling has been studied, but, again, assuming that all packets consist of an integer number of cells, where the scheduling time coincides with the cell time. We show that bufferless crossbars can operate directly on variable-size packets, with input-output connections being made and torn down asynchronously with respect to each other. Although such operation can initially be thought of as an extension of packetmode scheduling, the critical difference is that now the scheduling time is much longer than packet-size granularity. We study a transformation of the well-known iSLIP scheduling algorithm to asynchronous mode of operation, and we show by simulation that it can be adapted to yield throughput close to 100% under a range of workloads. The overall result is an efficient scheduling operation, with the added advantages of eliminating (a) packet fragmentation overhead (no partially filled cells), and (b) packet reassembly in the egress datapath.
Space-time coding (STC) is an important milestone in modern wireless communications. In this technique, more copies of the same signal are transmitted through different antennas (space) and different symbol periods (time), to improve the robustness of a wireless system by increasing its diversity gain. STCs are channel coding algorithms that can be readily implemented on a field programmable gate array (FPGA) device. This work provides some figures for the amount of required FPGA hardware resources, the speed that the algorithms can operate and the power consumption requirements of a space-time block code (STBC) encoder. Seven encoder very high-speed integrated circuit hardware description language (VHDL) designs have been coded, synthesised and tested. Each design realises a complex orthogonal space-time block code with a different transmission matrix. All VHDL designs are parameterisable in terms of sample precision. Precisions ranging from 4 bits to 32 bits have been synthesised. Alamouti's STBC encoder design [Alamouti, S.M. (1998), 'A Simple Transmit Diversity Technique for Wireless Communications', IEEE Journal on Selected Areas in Communications, 16:55-108.] proved to be the best trade-off, since it is on average 3.2 times smaller, 1.5 times faster and requires slightly less power than the next best trade-off in the comparison, which is a 3/4-rate full-diversity 3Tx-antenna STBC.
Orthogonal frequency division multiplexing (OFDM)-based feed-forward spacetime trellis code (FFSTTC) encoders can be synthesised as very high speed integrated circuit hardware description language (VHDL) designs. Evaluation of their FPGA implementation can lead to conclusions that help a designer to decide the optimum implementation, given the encoder structural parameters. VLSI architectures based on 1-bit multipliers and look-up tables (LUTs) are compared in terms of FPGA slices and block RAMs (area), as well as in terms of minimum clock period (speed). Area and speed graphs versus encoder memory order are provided for quadrature phase shift keying (QPSK) and 8 phase shift keying (8-PSK) modulation and two transmit antennas, revealing best implementation under these conditions. The effect of number of modulation bits and transmit antennas on the encoder implementation complexity is also investigated.
We study the scaling of parallel-matching crossbar schedulers to radices above 100. First, we examine a traditional microarchitecture that implements the matching decision of each input and each output of the crossbar in a separate arbiter block and communicates the matching decisions between the input and the output arbiters through global point-to-point links. Using simple models and experimentation with 90nm CMOS layouts, we show that this architecture is expensive because the global point-to-point links take up O(N 4) area, where N the radix of the crossbar. Next, by observing that the wiring of an arbiter fits in a minimal O(N logN) area, we propose a novel microarchitecture that inverts the locality of wires by orthogonally interleaving the input with the output arbiters, thus lowering the wiring area of the scheduler down to O(N 2 log 2 N). Using this architecture, the scheduler for a radix-128 FIFO, VOQ, or 2-VC crossbar becomes gate limited, fitting in 3.6, 7.2, and 7.2mm 2 respectively, which is a 40, 50, and 70% improvement compared to the traditional. Moreover, the proposed schedulers find a new match in less than 10ns, thus allowing a minimum packet below 30Bytes at 24Gb/s line rate. Based on these findings, we conclude that crossbar schedulers are feasible even for radices above 100.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.