The explosive growth of Internet-connected devices will soon result in a flood of generated data, which will increase the demand for network bandwidth as well as compute power to process the generated data. Consequently, there is a need for more energy efficient servers to empower traditional centralized Cloud data-centers as well as emerging decentralized data-centers at the Edges of the Cloud. In this paper, we present our approach, which aims at developing a new class of micro-servers-the UniServer-that exceed the conservative energy and performance scaling boundaries by introducing novel mechanisms at all layers of the design stack. The main idea lies on the realization of the intrinsic hardware heterogeneity and the development of mechanisms that will automatically expose the unique varying capabilities of each hardware component within commercial microservers and allow their operation at new extended operating points. Low overhead schemes are employed to monitor and predict the hardware behavior and report it to the system software. The system software including a virtualization and resource management layer is responsible for optimizing the system operation in terms of energy or performance, while guaranteeing non-disruptive operation under the extended operating points. Our characterization results on a 64-bit ARMv8 micro-server in 28nm process reveal large voltage margins in terms of Vmin variation among the 8 cores of the CPU chip, among 3 different sigma chips, and among different benchmarks with the potential to obtain up-to 38.8% energy savings. Similarly, DRAM characterizations show that refresh rate and voltage can be relaxed by 43x and 5%, respectively, leading to 23.2% power savings on average.
This work performs a thorough characterization and analysis of the open source Lucene search library. The article describes in detail the architecture, functionality, and micro-architectural behavior of the search engine, and investigates prominent online document search research issues. In particular, we study how intra-server index partitioning affects the response time and throughput, explore the potential use of low power servers for document search, and examine the sources of performance degradation ands the causes of tail latencies. Some of our main conclusions are the following: (a) intra-server index partitioning can reduce tail latencies but with diminishing benefits as incoming query traffic increases, (b) low power servers given enough partitioning can provide same average and tail response times as conventional high performance servers, (c) index search is a CPU-intensive cache-friendly application, and (d) C-states are the main culprits for performance degradation in document search. 19:2 Z. Hadjilambrou et al. search services are required to provide tight QoS guarantees, such as tail latencies below 500ms [2] even at peak traffic loads. Previous work aims at improving the latency, efficiency and cost of operation of search services. In the work of Meisner et al. [27], full system power management is evaluated for a web search workload. To improve energy efficiency, Lo et al. [20] proposed running each server just fast enough to satisfy global latency requirements, whereas Vamanan et al. [33] proposed to exploit time slack by slowing down individual sub-queries. The possibility of using mobile cores for web search for improved cost and energy efficiency is studied in the work of Reddi et al. [30]. Ren et al. [31] examined how web search can benefit from heterogeneous cores, whereas Haque et al. [10] and Jeon et al. [15] looked at adaptive parallelism for improving response times. Work stealing for meeting web search target latency is proposed by Li et al. [17]. Hsu et al. [14] propose a turbo boost framework that increases CPU voltage and frequency at fine-grain time intervals to reduce the latency of computational heavy search queries. Other work has collocated search applications with other types of workloads to increase data center utilization [25,26,35].This article presents a thorough top-down characterization of an open source search engine to improve the overall understanding of search engines. In particular, this work presents a characterization of the Lucene-based Nutch web search benchmark [8] on real hardware providing insights about the application and micro-architectural level behavior of this benchmark. This workload is based on the popular Lucene document search engine. Previous characterization efforts of this benchmark focused only on the query stream characterization [34] and micro-architectural characterization [8]. Another work conducted with the Nutch benchmark [9] evaluated the performance of index intra-server partitioning and slower cores. However, that work used a small inde...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.