As the role of information and communication technologies gradually increases in our lives, software security becomes a major issue to provide protection against malicious attempts and to avoid ending up with noncompensable damages to the system. With the advent of data-driven techniques, there is now a growing interest in how to leverage machine learning (ML) as a software assurance method to build trustworthy software systems. In this study, we examine how to predict software vulnerabilities from source code by employing ML prior to their release. To this end, we develop a source code representation method that enables us to perform intelligent analysis on the Abstract Syntax Tree (AST) form of source code and then investigate whether ML can distinguish vulnerable and nonvulnerable code fragments. To make a comprehensive performance evaluation, we use a public dataset that contains a large amount of function-level real source code parts mined from open-source projects and carefully labeled according to the type of vulnerability if they have any. We show the effectiveness of our proposed method for vulnerability prediction from source code by carrying out exhaustive and realistic experiments under different regimes in comparison with state-of-art methods.
Effective protection against cyber-attacks requires constant monitoring and analysis of system data in an IT infrastructure, such as log files and network packets, which may contain private and sensitive information. Security operation centers (SOC), which are established to detect, analyze and respond to cyber-security incidents, often utilize detection models either for known types of attacks or for anomaly and applies them to the system data for detection. SOC are also motivated to keep their models private to capitalize on the models that are their propriety expertise, and to protect their detection strategies against adversarial machine learning. In this paper, we develop a protocol for privately evaluating detection models on the system data, in which privacy of both the system data and detection models is protected and information leakage is either prevented altogether or quantifiably decreased. Our main approach is to provide an end-to-end encryption for the system data and detection models utilizing lattice-based cryptography that allows homomorphic operations over ciphertext. We employ recent data sets in our experiments which demonstrate that the proposed privacy-preserving intrusion detection system is feasible in terms of execution times and bandwidth requirements and reliable in terms of accuracy.
As machine learning and artificial intelligence (ML/AI) are becoming more popular and advanced, there is a wish to turn sensitive data into valuable information via ML/AI techniques revealing only data that is allowed by concerned parties or without revealing any information about the data to third parties. Collaborative ML approaches like federated learning (FL) help tackle these needs and concerns, bringing a way to use sensitive data without disclosing critically sensitive features of that data. In this paper, we provide a detailed analysis of state of the art for collaborative ML approaches from a privacy perspective. A detailed threat model and security and privacy considerations are given for each collaborative method. We deeply analyze Privacy Enhancing Technologies (PETs), covering secure multi-party computation (SMPC), homomorphic encryption (HE), differential privacy (DP), and confidential computing (CC) in the context of collaborative ML. We introduce a guideline on the selection of the privacy preserving technologies for collaborative ML and privacy practitioners. This study constitutes the first survey to provide an indepth focus on collaborative ML requirements and constraints for privacy solutions while also providing guidelines on the selection of PETs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.