Exploiting memory level parallelism (MLP) is crucial to hide long memory and last level cache access latencies. While out-of-order (OoO) cores, and techniques building on them, are effective at exploiting MLP, they deliver poor energy efficiency due to their complex hardware and the resulting energy overheads. As energy efficiency becomes the prime design constraint, we investigate low complexity/energy mechanisms to exploit MLP. This work revisits slice-out-of-order (sOoO) cores as an energy efficient alternative to OoO cores for MLP exploitation. These cores construct slices of MLP generating instructions and execute them out-of-order with respect to the rest of instructions. However, the slices and the remaining instructions, by themselves, execute in-order. Though their energy overhead is low compared to full OoO cores, sOoO cores fall considerably behind in terms of MLP extraction. We observe that their dependence-oblivious inorder slice execution causes dependent slices to frequently block MLP generation. To boost MLP generation in sOoO cores, we introduce Freeway, a sOoO core based on a new dependence-aware slice execution policy that tracks dependent slices and keeps them out of the way of MLP extraction. The proposed core incurs minimal area and power overheads, yet approaches the MLP benefits of fully OoO cores. Our evaluation shows that Freeway outperforms the state-of-the-art sOoO core by 12% and is within 7% of the MLP limits of full OoO execution.
Flexible instruction scheduling is essential for performance in out-of-order processors. This is typically achieved by using CAM-based Instruction Queues (IQs) that provide complete flexibility in choosing ready instructions for execution, but at the cost of significant scheduling energy.In this work we seek to reduce the instruction scheduling energy by reducing the depth and width of the IQ. We do so by classifying instructions based on their readiness and criticality, and using this information to bypass the IQ for instructions that will not benefit from its expensive scheduling structures and delay instructions that will not harm performance. Combined, these approaches allow us to offload a significant portion of the instructions from the IQ to much cheaper FIFO-based scheduling structures without hurting performance. As a result we can reduce the IQ depth and width by half, thereby saving energy.Our design, Delay and Bypass (DNB), is the first design to explicitly address both readiness and criticality to reduce scheduling energy. By handling both classes we are able to achieve 95% of the baseline out-of-order performance while only using 33% of the scheduling energy. This represents a significant improvement over previous designs which addressed only criticality or readiness (91%/89% performance at 74%/53% energy).
In Total Store Order memory consistency (TSO), loads can be speculatively reordered to improve performance. If a load-load reordering is seen by other cores, speculative loads must be squashed and re-executed. In architectures with an unordered interconnection network and directory coherence, this has been the established view for decades. We show, for the first time, that it is not necessary to squash and re-execute speculatively reordered loads in TSO when their reordering is seen. Instead, the reordering can be hidden form other cores by the coherence protocol. The implication is that we can irrevocably bind speculative loads. This allows us to commit reordered loads out-of-order without having to wait (for the loads to become non-speculative) or without having to checkpoint committed state (and rollback if needed), just to ensure correctness in the rare case of some core seeing the reordering. We show that by exposing a reordering to the coherence layer and by appropriately modifying a typical directory protocol we can successfully hide load-load reordering without perceptible performance cost and without deadlock. Our solution is cost-effective and increases the performance of out-of-order commit by a sizable margin, compared to the base case where memory operations are not allowed to commit if the consistency model could be violated.
Speculative execution is necessary for achieving high performance on modern general-purpose CPUs but, starting with Spectre and Meltdown, it has also been proven to cause severe security flaws. In case of a misspeculation, the architectural state is restored to assure functional correctness but a multitude of microarchitectural changes (e.g., cache updates), caused by the speculatively executed instructions, are commonly left in the system. These changes can be used to leak sensitive information, which has led to a frantic search for solutions that can eliminate such security flaws. The contribution of this work is an evaluation of the cost of hiding speculative side-effects in the cache hierarchy, making them visible only after the speculation has been resolved. For this, we compare (for the first time) two broad approaches: i) waiting for loads to become non-speculative before issuing them to the memory system, and ii) eliminating the side-effects of speculation, a solution consisting of invisible loads (Ghost loads) and performance optimizations (Ghost Buffer and Materialization). While previous work, InvisiSpec, has proposed a similar solution to our latter approach, it has done so with only a minimal evaluation and at a significant performance cost. The detailed evaluation of our solutions shows that: i) waiting for loads to become non-speculative is no more costly than the previously proposed InvisiSpec solution, albeit much simpler, non-invasive in the memory system, and stronger security-wise; ii) hiding speculation with Ghost loads (in the context of a relaxed memory model) can be achieved at the cost of 12% performance degradation and 9% energy increase, which is significantly better that the previous state-of-the-art solution.
The number of instructions a processor's instruction queue can examine (depth) and the number it can issue together (width) determine its ability to take advantage of the ILP in an application. Unfortunately, increasing either the width or depth of the instruction queue is very costly due to the content-addressable logic needed to wakeup and select instructions out-of-order. This work makes the observation that a large number of instructions have both operands ready at dispatch, and therefore do not benefit from out-of-order scheduling. We leverage this to place such ready-at-dispatch instructions in separate, simpler, inorder FIFO queues for scheduling. With such additional queues, we can reduce the size and width of the expensive out-of-order instruction queue, without reducing the processor's overall issue width and depth. Our design, FIFOrder, is able to steer more than 60% of instructions to the cheaper FIFO queues, providing a 50% energy savings over a traditional out-of-order instruction queue design, while delivering 8% higher performance.
Asphaltenes are formed during the primary upgrading process of bitumen. Gasification is considered as a good option for conversion of the low-value asphaltenes into syngas. Syngas can later be used in the generation of steam for steam-assisted gravity drainage, and the hydrogen from the syngas can be used for upgrading. Understanding the decomposition of asphaltenes, the subsequent formation of soot, and the properties of soot enables optimization of the process of asphaltene gasification. In this work, the effects of feed particle size, temperature, and residence time on the formation of soot particles during the pyrolysis of Athabasca oil sand asphaltenes were investigated. Morphological, structural, and elemental properties of collected soot were also investigated. The experiments were carried out in an atmospheric entrained-flow reactor that was electrically heated to a range of set temperatures between 800 and 1400 °C, and the residence time was varied between 5 to 12 s by controlling the carrier gas nitrogen flow rate. The pyrolysis products were air-cooled in the collection probe after the reactor and passed through a cyclone to separate out particles larger than 10 μm. The particles smaller than 10 μm were subsequently passed through a cascade impactor for segregation of soot and ash particles in the size range from 0.03 to 3 μm. It was found that asphaltenes devolatilize to produce char, light gases, and tar. The different feed particle size ranges used in this study had a negligible effect on the analyzed properties of the soot. The yield of soot formed increased with the pyrolysis temperature. It was observed that the average size of primary soot particles decreases with an increase in temperature. The sulfur and hydrogen contents of soot decrease with temperature as a result of the liberation of sulfur as hydrogen sulfide during the pyrolysis reactions. With increasing residence time, the average size of the primary soot particles increases.
Oxygen production in an economic way is critical to oxy-firing combustion, a carbon capture technology. Oxygen-deficient oxides have been used for absorption of oxygen from air and desorption of oxygen in recycled flue gas for oxy-firing combustion. Cuprous/cupric oxide equilibrium with alumina can be used as an alternative for oxygen production and absorption. In this work, the effect of the spinel phase (CuAl2O4) content on oxygen sorptive/desorptive properties of CuO–CuAl2O4 sorbents has been investigated using thermogravimetric analysis. The desorption rate and oxygen sorption capacity were shown to be dependent upon the amount of alumina addition. The effect of SO2 and H2O in flue gas was investigated using FACTSage over a range of conditions. It was found that, at temperatures above 750 °C, CuO is inert to these species, making it a proper choice for oxygen carrier. Cyclic stability was also investigated using the same instrument. The molar CuAl2O4 content of 20% was observed to have the most positive cyclic stability effect. A stable morphology was observed in the scanning electron microscopy microstructure of the sorbent with this CuAl2O4 content. Sintering at a lower CuAl2O4 content and attrition because of second-phase agglomeration can destabilize the sorbents.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.