We propose methods for reducing the energy consumed by snoop requests in snoopy bus-based symmetric multiprocessor (SMP) systems. Observing that a large fraction of snoops do not find copies in many of the other caches, we introduce JETTY, a small, cache-like structure. A JETTY is introduced in-between the bus and the L2 backside of each processor. There it filters the vast majority of snoops that would not find a locally cached copy. Energy is reduced as accesses to the much more energy demanding L2 tag arrays are decreased. No changes in the existing coherence protocol are required and no performance loss is experienced. We evaluate our method on a 4-way SMP server using a set of shared-memory applications. We demonstrate that a very small JETTY filters 74% (average) of all snoop-induced tag accesses that would miss. This results in an average energy reduction of 29% (range: 12% to 40%) measured as a fraction of the energy required by all L2 accesses (both tag and data arrays).
On-chip network is becoming critical to the scalability of future many-core architectures. Recently, nanophotonics has been proposed for on-chip networks because of its low latency and high bandwidth. However, nanophotonics has relatively high static power consumption, which can lead to inefficient architectures. In this work, we propose FlexiShare -a nanophotonic crossbar architecture that minimizes static power consumption by fully sharing a reduced number of channels across the network. To enable efficient global sharing, we decouple the allocation of the channels and the buffers, and introduce novel photonic token-stream mechanism for channel arbitration and credit distribution.The flexibility of FlexiShare introduces additional router complexity and electrical power consumption. However, with the reduced number of optical channels, the overall power consumption is reduced without loss in performance. Our evaluation shows that the proposed token-stream arbitration applied to a conventional crossbar design improves network throughput by 5.5× under permutation traffic. In addition, FlexiShare achieves similar performance as a token-stream arbitrated conventional crossbar using only half the amount of channels under balanced, distributed traffic. With the extracted trace traffic from MineBench and SPLASH-2, FlexiShare can further reduce the amount of channels by up to 87.5%, while still providing better performance -resulting in up to 72% reduction in power consumption compared to the best alternative.
We introduce a new technique for determining a mobile phone's indoor location even when Wi-Fi infrastructure is unavailable or sparse. Our technique is based on a new ambient sound fingerprint called the Acoustic Background Spectrum (ABS). An ABS serves well as a room fingerprint because it is compact, easily computed, robust to transient sounds, and surprisingly distinctive. As with other fingerprint-based localization techniques, location is determined by measuring the current fingerprint and then choosing the "closest" fingerprint from a database. An experiment involving 33 rooms yielded 69% correct fingerprint matches meaning that, in the majority of observations, the fingerprint was closer to a previous visit's fingerprint than to any fingerprints from the other 32 rooms. An implementation of ABS-localization called Batphone is publicly available for Apple iPhones. We used Batphone to show the benefit of using ABS-localization together with a commercial WiFi-based localization method. In this second experiment, adding ABS improved room-level localization accuracy from 30% (Wi-Fi only) to 69% (Wi-Fi and ABS). While Wi-Fibased localization has difficulty distinguishing nearby rooms, Batphone performs just as well with nearby rooms; it can distinguish pairs of adjacent rooms with 92% accuracy.
We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N ∼ 10 7 particles. Our code is based on the Hénon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures, and the introduction of a parallel random number generation scheme, as well as a parallel sorting algorithm, required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface (MPI) library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude, from 10 5 to 10 7 . We find that our results are in good agreement with self-similar core-collapse solutions, and the core collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N = 10 5 , 128 for N = 10 6 and 256 for N = 10 7 . The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the parallel sorting algorithm.The resulting maximum speedups we achieve are approximately 60×, 100×, and 220×, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.