Incoming and outgoing processing for a given TCP connection often execute on different cores: an incoming packet is typically processed on the core that receives the interrupt, while outgoing data processing occurs on the core running the relevant user code. As a result, accesses to read/write connection state (such as TCP control blocks) often involve cache invalidations and data movement between cores' caches. These can take hundreds of processor cycles, enough to significantly reduce performance.We present a new design, called Affinity-Accept, that causes all processing for a given TCP connection to occur on the same core. Affinity-Accept arranges for the network interface to determine the core on which application processing for each new connection occurs, in a lightweight way; it adjusts the card's choices only in response to imbalances in CPU scheduling. Measurements show that for the Apache web server serving static files on a 48-core AMD system, Affinity-Accept reduces time spent in the TCP stack by 30% and improves overall throughput by 24%.
Effective use of CPU data caches is critical to good performance, but poor cache use patterns are often hard to spot using existing execution profiling tools. Typical profilers attribute costs to specific code locations. The costs due to frequent cache misses on a given piece of data, however, may be spread over instructions throughout the application. The resulting individually small costs at a large number of instructions can easily appear insignificant in a code profiler's output.DProf helps programmers understand cache miss costs by attributing misses to data types instead of code. Associating cache misses with data helps programmers locate data structures that experience misses in many places in the application's code. DProf introduces a number of new views of cache miss data, including a data profile, which reports the data types with the most cache misses, and a data flow graph, which summarizes how objects of a given type are accessed throughout their lifetime, and which accesses incur expensive cross-CPU cache loads. We present two case studies of using DProf to find and fix cache performance bottlenecks in Linux. The improvements provide a 16-57% throughput improvement on a range of memcached and Apache workloads.
Small-form-factor, low-power wireless sensors-motes-are convenient to deploy, but lack the bandwidth to capture and transmit raw high-frequency data, such as human voices or neural signals, in real time. Local filtering can help, but we show that the right filter settings depend on changing ambient conditions and network effects such as congestion, which makes them dynamic and unpredictable. Mote collection systems for high-frequency data must support iteratively-tuned, deployment-specific filter settings as well as fast sampling.VANGO, our software system for high-frequency data collection, achieves these goals via integrated processing across network tiers. Bandwidth-limited sensor nodes reduce data in network but rely on microservers, which have greater computational capabilities and a wider scope of observation, to plan how. VANGO provides a cross-platform library for data transformation, measurement, and classification; a fast and low-jitter data acquisition system for motes; and a mechanism to control mote and microserver signal processing. With VANGO we have developed new applications: the first acoustic collection system for motes responsive to changing environmental conditions and user interests, and the first neural spike acquisition application capable of supporting a network of nodes.
Abstract-Existing approaches used to develop compact low-power multichannel wireless neural recording systems range from creating custom-integrated circuits to assembling commercial-off-the-shelf (COTS) PC-based components. Custom-integrated-circuit designs yield extremely compact and low-power devices at the expense of high development and upgrade costs and turn-around times, while assembling COTS-PC-technology yields high performance at the expense of large system size and increased power consumption. To achieve a balance between implementing an ultra-compact custom-fabricated neural transceiver and assembling COTS-PC-technology, an overlay of a neural interface upon the TinyOS-based MICA2 platform is described. The system amplifies, digitally encodes, and transmits neural signals real-time at a rate of 9.6 kbps, while consuming less than 66 mW of power. The neural signals are received and forwarded to a client PC over a serial connection. This data rate can be divided for recording on up to 6 channels, with a resolution of 8 bits/sample. This work demonstrates the strengths and limitations of the TinyOS-based sensor technology as a foundation for chronic remote biological monitoring applications and, thus, provides an opportunity to create a system that can leverage from the frequent networking and communications advancements being made by the global TinyOS-development community.
Wireless-enabled processor modules intended for communicating low-frequency phenomena (i.e., temperature, humidity, and ambient light) have been enabled to acquire and transmit multiple biological signals in real time, which has been achieved by using computationally efficient data acquisition, filtering, and compression algorithms, and interfacing the modules with biological interface hardware. The sensor modules can acquire and transmit raw biological signals at a rate of 32 kb/s, which is near the hardware limit of the modules. Furthermore, onboard signal processing enables one channel, sampled at a rate of 4000 samples/s at 12-bit resolution, to be compressed via adaptive differential-pulse-code modulation (ADPCM) and transmitted in real time. In addition, the sensors can be configured to filter and transmit individual time-referenced "spike" waveforms, or to transmit the spike height and width for alleviating network traffic and increasing battery life. The system is capable of acquiring eight channels of analog signals as well as data via an asynchronous serial connection. A back-end server archives the biological data received via networked gateway sensors, and hosts them to a client application that enables users to browse recorded data. The system also acquires, filters, and transmits oxygen saturation and pulse rate via a commercial-off-the-shelf interface board. The system architecture can be configured for performing real-time nonobtrusive biological monitoring of humans or rodents. This paper demonstrates that low-power, computational, and bandwidth-constrained wireless-enabled platforms can indeed be leveraged for wireless biosignal monitoring.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.