Science networks and their hosted applications require large and frequent data transfers, but these transfers are subject to network performance degradation, including queuing delays and packet drops. However, well known network dynamics along with limited instrumentation access complicate the creation of an accurate method that predicts different performance aspects of data transfers. In this study, we develop a lightweight machine learning tool to predict end-to-end packet retransmission in science flows of arbitrary size. We also identify the minimum set of necessary path and host measurements needed as input features in our predictor in order to achieve high accuracy. In our evaluation process our predictor demonstrated low training times and was able to provide accurate estimates (97-99%) for packet retransmissions of data transfers of arbitrary sizes. The results also manifest that the our solution was able to predict retransmit behavior reasonably well (66%) even for previously unseen data if training and testing datasets had similar statistics.
Abstract-Traditional intrusion detection systems are not adaptive enough to cope with the dynamic characteristics of cloud-hosted virtual infrastructures. This makes them unable to address new cloud-oriented security issues. In this paper we introduce SAIDS, a self-adaptable intrusion detection system tailored for cloud environments. SAIDS is designed to reconfigure its components based on environmental changes. A prototype of SAIDS is described.
Scientific computing sometimes involves computation on sensitive data. Depending on the data and the execution environment, the HPC (high-performance computing) user or data provider may require confidentiality and/or integrity guarantees. To study the applicability of hardware-based trusted execution environments (TEEs) to enable secure scientific computing, we deeply analyze the performance impact of AMD SEV and Intel SGX for diverse HPC benchmarks including traditional scientific computing, machine learning, graph analytics, and emerging scientific computing workloads. We observe three main findings: 1) SEV requires careful memory placement on large scale NUMA machines (1×-3.4× slowdown without and 1×-1.15× slowdown with NUMA aware placement), 2) virtualization-a prerequisite for SEV-results in performance degradation for workloads with irregular memory accesses and large working sets (1×-4× slowdown compared to native execution for graph applications) and 3) SGX is inappropriate for HPC given its limited secure memory size and inflexible programming model (1.2×-126× slowdown over unsecure execution). Finally, we discuss forthcoming new TEE designs and their potential impact on scientific computing.
Research networks are designed to support high volume scientific data transfers that span multiple network links. Like any other network, research networks experience anomalies. Anomalies are deviations from profiles of normality in a research network's traffic levels. Diagnosing anomalies is critical both for network operators and users (e.g., scientists). In this paper we present Flowzilla, a general framework for detecting and quantifying anomalies on scientific data transfers of arbitrary size. Flowzilla incorporates Random Forest Regression(RFR) for predicting the size of data transfers and utilizes an adaptive threshold mechanism for detecting outliers. Our results demonstrate that our framework achieves up to 92.5% detection accuracy. Furthermore, we are able to predict data transfer sizes up to 10 weeks after training with accuracy above 90%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.