Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we have designed and built a composable, reconfigurable fabric to accelerate portions of large-scale software services. Each instantiation of the fabric consists of a 6x8 2-D torus of high-end Stratix V FPGAs embedded into a half-rack of 48 machines. One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables.In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system when ranking candidate documents. Under high load, the largescale reconfigurable fabric improves the ranking throughput of each server by a factor of 95% for a fixed latency distributionor, while maintaining equivalent throughput, reduces the tail latency by 29%.
Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we designed and built a composable, reconfigurable hardware fabric based on field programmable gate arrays (FPGA). Each server in the fabric contains one FPGA, and all FPGAs within a 48-server rack are interconnected over a low-latency, high-bandwidth network. We describe a medium-scale deployment of this fabric on a bed of 1632 servers, and measure its effectiveness in accelerating the ranking component of the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system. Under high load, the large-scale reconfigurable fabric improves the ranking throughput of each server by 95% at a desirable latency distribution or reduces tail latency by 29% at a fixed throughput. In other words, the reconfigurable fabric enables the same throughput using only half the number of servers.
In order to support near-term applications of quantum computing, a new compute paradigm has emerged-the quantum-classical cloud-in which quantum computers (QPUs) work in tandem with classical computers (CPUs) via a shared cloud infrastructure. In this work, we enumerate the architectural requirements of a quantum-classical cloud platform, and present a framework for benchmarking its runtime performance. In addition, we walk through two platform-level enhancements, parametric compilation and active qubit reset, that specifically optimize a quantumclassical architecture to support variational hybrid algorithms (VHAs), the most promising applications of near-term quantum hardware. Finally, we show that integrating these two features into the Rigetti Quantum Cloud Services (QCS) platform results in considerable improvements to the latencies that govern algorithm runtime. ‡ Current address: OpenAI,
Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we designed and built a composable, reconfigurable hardware fabric based on field programmable gate arrays (FPGA). Each server in the fabric contains one FPGA, and all FPGAs within a 48-server rack are interconnected over a low-latency, high-bandwidth network. We describe a medium-scale deployment of this fabric on a bed of 1632 servers, and measure its effectiveness in accelerating the ranking component of the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system. Under high load, the large-scale reconfigurable fabric improves the ranking throughput of each server by 95% at a desirable latency distribution or reduces tail latency by 29% at a fixed throughput. In other words, the reconfigurable fabric enables the same throughput using only half the number of servers.
To improve the reliability and the energy efficiency of datacenters, as well as to reduce infrastructure costs and environmental impacts, we demonstrated and evaluated the use of a 10 kW Proton Exchange Membrane Fuel Cell (PEMFC) stack and system for powering the servers in a data center.In this study, we designed, tested and demonstrated a PEMFC system as a Distributed Generation (DG) prime mover that has high reliability and efficiency for both steady state and dynamic operations. The 10kW PEMFC stack and system was designed to power a server rack and eliminate the power distribution system in the datacenter. The steady state electrical properties such as efficiency and polarization curves were evaluated. The ramp rate and dynamic response of the PEMFC system to server and system dynamics was also characterized and can be used to determine energy storage requirements and develop optimal control strategies to enable the dynamic load following capability. INTRODUCTIONFuel cell technology is an attractive electrical power generation technology receiving a great deal of recent attention. Fuel cells directly convert fuel to electricity. The direct electrochemical conversion of fuel allows for high fuel-toelectric conversion efficiencies without pollutant emissions. As reliable and environmentally friendly power sources, fuel cells can potentially play a very important role in data centers. A simple way of utilizing fuel cells as data center power sources is to connect them in grid parallel or backup generators. For
For a native gate set which includes all single-qubit gates, we apply results from symplectic geometry to analyze the spaces of two-qubit programs accessible within a fixed number of gates. These techniques yield an explicit description of this subspace as a convex polytope, presented by a family of linear inequalities themselves accessible via a finite calculation.We completely describe this family of inequalities in a variety of familiar example cases, and as a consequence we highlight a certain member of the "XY-family" for which this subspace is particularly large, i.e., for which many two-qubit programs admit expression as low-depth circuits.Corollary (Lemma 46). Allowing the parameter of CPHASE to range freely in 0 ≤ α ≤ 2π, the sets P 2 CPHASE and P 3 CPHASE are the same as the corresponding sets for S = {CZ}. Hence, P 2 CPHASE occupies 0% of the volume of all twoqubit programs, and L CPHASE = 3.We find the situation to be quite different for XY:Corollary (Somewhat informal 3 ; Corollary 53, Remark 57). As a function of α, the volume of the set P 2 XYα is maximized at α = 3π/4, where it contains 75% of randomly sampled two-qubit programs. Correspondingly, L XYα is minimized as L XYα = 9/4. Allowing the parameter of XY to range freely, the set P 2 XY contains ≈96% of randomly sampled all two-qubit programs, with corresponding value L XY ≈ 2.04. 4). We include as appendices an introduction to the mathematics underpinning these results as well as a simpler viewpoint that yields similar qualitative results but is quantitatively inexplicit. The geometry of two-qubit programs and the canonical decompositionAs motivation, we include a brief treatment of the Euler decomposition of single-qubit programs 3 In particular, the notion of "volume" is different from the usual Haar volume, and so "randomly sampled" also changes meaning.Accepted in Quantum 2020-03-19, click title to verify. Published under CC-BY 4.0.7 Identifying a useful analogue of Q and of P U (2) ⊗2 is the primary inhibitor of generalizing this to higher qubit counts. See [45, Proposition IV.3] for a list of references concerning the provenance of this operator Q.
Quilc is an open-source, optimizing compiler for gate-based quantum programs written in Quil or QASM, two popular quantum programming languages. The compiler was designed with attention toward NISQ-era quantum computers, specifically recognizing that each quantum gate has a non-negligible and often irrecoverable cost toward a program's successful execution. Quilc's primary goal is to make authoring quantum software a simpler exercise by making architectural details less burdensome to the author. Using Quilc allows one to write programs faster while usually not compromising-and indeed sometimes improving-their execution fidelity on a given hardware architecture. In this paper, we describe many of the principles behind Quilc's design, and demonstrate the compiler with various examples.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.