In a contemporary data center, Linux applications often generate a large quantity of real-time system call traces, which are not suitable for traditional host-based intrusion detection systems deployed on every single host. Training data mining models with system calls on a single host that has static computing and storage capacity is time-consuming, and intermediate datasets are not capable of being efficiently handled. It is cumbersome for the maintenance and updating of host-based intrusion detection systems (HIDS) installed on every physical or virtual host, and comprehensive system call analysis can hardly be performed to detect complex and distributed attacks among multiple hosts. Considering these limitations of current system-call-based HIDS, in this article, we provide a review of the development of system-call-based HIDS and future research trends. Algorithms and techniques relevant to system-call-based HIDS are investigated, including feature extraction methods and various data mining algorithms. The HIDS dataset issues are discussed, including currently available datasets with system calls and approaches for researchers to generate new datasets. The application of system-call-based HIDS on current embedded systems is studied, and related works are investigated. Finally, future research trends are forecast regarding three aspects, namely, the reduction of the false-positive rate, the improvement of detection efficiency, and the enhancement of collaborative security.
Sensing coverage is a fundamental problem in wireless sensor networks (WSNs), which has attracted considerable attention. Conventional research on this topic focuses on the 0/1 coverage model, which is only a coarse approximation to the practical sensing model. In this paper, we study the target coverage problem, where the objective is to find the least number of sensor nodes in randomly-deployed WSNs based on the probabilistic sensing model. We analyze the joint detection probability of target with multiple sensors. Based on the theoretical analysis of the detection probability, we formulate the minimum ϵ-detection coverage problem. We prove that the minimum ϵ-detection coverage problem is NP-hard and present an approximation algorithm called the Probabilistic Sensor Coverage Algorithm (PSCA) with provable approximation ratios. To evaluate our design, we analyze the performance of PSCA theoretically and also perform extensive simulations to demonstrate the effectiveness of our proposed algorithm.
Abstract-MapReduce is becoming the state-of-the-art computing paradigm for processing large-scale datasets on a large cluster with tens or thousands of nodes. It has been widely used in various fields such as e-commerce, Web search, social networks, and scientific computation. Understanding the characteristics of MapReduce workloads is the key to achieving better configuration decisions and improving the system throughput. However, workload characterization of MapReduce, especially in a largescale production environment, has not been well studied yet.To gain insight on MapReduce workloads, we collected a two-week workload trace from a 2,000-node Hadoop cluster at Taobao, which is the biggest online e-commerce enterprise in Asia, ranked 14 ℎ in the world as reported by Alexa. The workload trace covered 912,157 jobs, logged from Dec. 4 to Dec. 20, 2011. We characterized the workload at the granularity of job and task, respectively and concluded with a set of interesting observations. The results of workload characterization are representative and generally consistent with data platforms for e-commerce websites, which can help other researchers and engineers understand the performance and job characteristics of Hadoop in their production environments. In addition, we use these job analysis statistics to derive several implications for potential performance optimization solutions.
The homogenization of temperature field and temperature gradient field are very important for many devices, systems and equipments, such as satellites and electronic devices. This paper discusses the distribution optimization of the limited high conductivity material with the simulated annealing algorithm to homogenize the temperature field in a two-dimensional heat conduction problem. At the same time, the temperature gradient field is homogenized with the bionic optimization method. The results show that the two optimization targets are consistent to some extent, while the bionic optimization method could save much computing time. In addition, there are threshold values for the amount of high conductivity material and the ratio of the high conductivity to the low conductivity beyond which further increasing these values brings very little improvement on the homogenization of temperature field and temperature gradient field. temperature homogenization, temperature gradient homogenization, bionic optimization, simulated annealing algorithm
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.