Existing studies on BitTorrent systems are single-torrent based, while more than 85% of all peers participate in multiple torrents according to our trace analysis. In addition, these studies are not sufficiently insightful and accurate even for single-torrent models, due to some unrealistic assumptions. Our analysis of representative BitTorrent traffic provides several new findings regarding the limitations of BitTorrent systems: (1) Due to the exponentially decreasing peer arrival rate in reality, service availability in such systems becomes poor quickly, after which it is difficult for the file to be located and downloaded. (2) Client performance in the BitTorrentlike systems is unstable, and fluctuates widely with the peer population. (3) Existing systems could provide unfair services to peers, where peers with high downloading speed tend to download more and upload less. In this paper, we study these limitations on torrent evolution in realistic environments. Motivated by the analysis and modeling results, we further build a graph based multi-torrent model to study inter-torrent collaboration. Our model quantitatively provides strong motivation for inter-torrent collaboration instead of directly stimulating seeds to stay longer. We also discuss a system design to show the feasibility of multi-torrent collaboration.
Abstract-This paper presents a performance study of BitTorrent-like P2P systems by modeling, based on extensive measurements and trace analysis. Existing studies on BitTorrent systems are single-torrent based and usually assume the process of request arrivals to a torrent is Poisson-like. However, in reality, most BitTorrent peers participate in multiple torrents and file popularity changes over time.Our study of representative BitTorrent traffic provides insights into the evolution of single-torrent systems and several new findings regarding the limitations of BitTorrent systems: (1) Due to the exponentially decreasing peer arrival rate in a torrent, the service availability of the corresponding file becomes poor quickly, and eventually it is hard to locate and download this file.(2) Client performance in the BitTorrent-like system is unstable, and fluctuates significantly with the changes of the number of online peers. (3) Existing systems could provide unfair services to peers, where a peer with a higher downloading speed tends to download more and upload less. Motivated by the analysis and modeling results, we have further proposed a graph based model to study interactions among multiple torrents. Our model quantitatively demonstrates that inter-torrent collaboration is much more effective than stimulating seeds to serve longer for addressing the service unavailability in BitTorrent systems. An architecture for inter-torrent collaboration under an exchange based instant incentive mechanism is also discussed and evaluated by simulations.
As a family of wireless local area network (WLAN) protocols between physical layer and higher layer protocols, IEEE 802.11 has to accommodate the features and requirements of both ends. However, current practice has addressed the problems of these two layers separately and is far from satisfactory. On one end, due to varying channel conditions, WLANs have to provide multiple physical channel rates to support various signal qualities. A low channel rate station not only suffers low throughput, but also significantly degrades the throughput of other stations. On the other end, the power saving mechanism of 802.11 is ineffective in TCP-based communications, in which the wireless network interface (WNI) has to stay awake to quickly acknowledge senders, and hence, the energy is wasted on channel listening during idle awake time.In this paper, considering the needs of both ends, we utilize the idle communication power of the WNI to provide a Cooperative Relay Service (CRS) for WLANs with multiple channel rates. We characterize energy efficiency as energy per bit, instead of energy per second. In CRS, a high channel rate station relays data frames as a proxy between its neighboring stations with low channel rates and the Access Point, improving their throughput and energy efficiency. Different from traditional relaying approaches, CRS compensates a proxy for the energy consumed in data forwarding. The proxy obtains additional channel access time from its clients, leading to the increase of its own throughput without compromising its energy efficiency. Extensive experiments are conducted through a prototype implementation and ns-2 simulations to evaluate our proposed CRS. The experimental results show that CRS achieves significant performance improvements for both low and high channel rate stations.
In current databases, GPUs are used as dedicated accelerators to process each individual query. Sharing GPUs among concurrent queries is not supported, causing serious resource underutilization. Based on the profiling of an opensource GPU query engine running commonly used singlequery data warehousing workloads, we observe that the utilization of main GPU resources is only up to 25%. The underutilization leads to low system throughput.To address the problem, this paper proposes concurrent query execution as an effective solution. To efficiently share GPUs among concurrent queries for high throughput, the major challenge is to provide software support to control and resolve resource contention incurred by the sharing. Our solution relies on GPU query scheduling and device memory swapping policies to address this challenge. We have implemented a prototype system and evaluated it intensively. The experiment results confirm the effectiveness and performance advantage of our approach. By executing multiple GPU queries concurrently, system throughput can be improved by up to 55% compared with dedicated processing.
BackgroundInflammatory factors play a crucial role throughout the development and progression of atherosclerosis, which has been considered as a chronic vascular inflammatory disease. Luteolin, a natural flavonoid which exists in many natural medicinal materials, has anti-inflammatory, anti-fibrotic and other pharmacological effects. Recently, the protective effects of luteolin on the cardiovascular disease have been reported. However, there is a paucity of studies on anti-atherosclerosis. Therefore, the anti-atherosclerosis potential of luteolin remains to be elucidated.MethodApoE-/- mice were fed with a high-fat diet to induce atherosclerosis in an animal model, where they were treated with oral administration of luteolin for 12 weeks. Primary mouse peritoneal macrophages challenged with oxidized low-density lipoprotein (oxLDL) were used for in vitro mechanistic study. The effectiveness of luteolin in the ApoE-/- mouse model of atherosclerosis was estimated in the aortic sinus and enface, and the underlying mechanisms were explored by molecular modeling study and siRNA-induced gene silencing.ResultsOur results showed that luteolin remarkably attenuated atherosclerosis in high-fat diet-induced ApoE-/- mouse via alleviating inflammation. We further found that luteolin decreased oxLDL-induced inflammation by inhibiting signal transducer and activator of transcription 3 (STAT3) in vitro, respectively. Further molecular modeling analysis indicated that luteolin interacted with STAT3 primarily through hydrogen bond interaction.ConclusionLuteolin could be a promising candidate molecule for atherosclerosis, and STAT3 may be a potential therapeutic target that could prevent the development of atherosclerosis.
Abstract-Performance degradation of memory-intensive programs caused by the LRU policy's inability to handle weaklocality data accesses in the last level cache is increasingly serious for two reasons. First, the last-level cache remains in the CPU's critical path, where only simple management mechanisms, such as LRU, can be used, precluding some sophisticated hardware mechanisms to address the problem. Second, the commonly used shared cache structure of multi-core processors has made this critical path even more performance-sensitive due to intensive inter-thread contention for shared cache resources. Researchers have recently made efforts to address the problem with the LRU policy by partitioning the cache using hardware or OS facilities guided by run-time locality information. Such approaches often rely on special hardware support or lack enough accuracy. In contrast, for a large class of programs, the locality information can be accurately predicted if access patterns are recognized through small training runs at the data object level.To achieve this goal, we present a system-software framework referred to as Soft-OLP (Software-based Object-Level cache Partitioning). We first collect per-object reuse distance histograms and inter-object interference histograms via memory-trace sampling. With several low-cost training runs, we are able to determine the locality patterns of data objects. For the actual runs, we categorize data objects into different locality types and partition the cache space among data objects with a heuristic algorithm, in order to reduce cache misses through segregation of contending objects. The object-level cache partitioning framework has been implemented with a modified Linux kernel, and tested on a commodity multi-core processor. Experimental results show that in comparison with a standard L2 cache managed by LRU, Soft-OLP significantly reduces the execution time by reducing L2 cache misses across inputs for a set of single-and multi-threaded programs from the SPEC CPU2000 benchmark suite, NAS benchmarks and a computational kernel set.
Abstract. Multiprocessors based on simultaneous multithreaded (SMT) or multicore (CMP) processors are continuing to gain a significant share in both highperformance and mainstream computing markets. In this paper we evaluate the performance of OpenMP applications on these two parallel architectures. We use detailed hardware metrics to identify architectural bottlenecks. We find that the high level of resource sharing in SMTs results in performance complications, should more than 1 thread be assigned on a single physical processor. CMPs, on the other hand, are an attractive alternative. Our results show that the exploitation of the multiple processor cores on each chip results in significant performance benefits. We evaluate an adaptive, run-time mechanism which provides limited performance improvements on SMTs, however the inherent bottlenecks remain difficult to overcome. We conclude that out-of-the-box OpenMP code scales better on CMPs than SMTs. To maximize the efficiency of OpenMP on SMTs, new capabilities are required by the runtime environment and/or the programming interface.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.