After Action Reports (AARs) provide incisive analysis of cyber-incidents. Extracting cyberknowledge from these sources would provide security analysts with credible information, which they can use to detect or find patterns indicative of a cyber-attack. In this paper, we describe a system to extract information from AARs, aggregate the extracted information by fusing similar entities together, and represent that extracted information in a Cybersecurity Knowledge Graph (CKG). We extract entities by building a customized named entity recognizer called 'Malware Entity Extractor' (MEE). We then build a neural network to predict how pairs of 'malware entities' are related to each other. When we have predicted entity pairs and the relationship between them, we assert the 'entity-relationship set' in a CKG. Our next step in the process is to fuse similar entities, to improve our CKG. This fusion helps represent intelligence extracted from multiple documents and reports. The fused CKG has knowledge from multiple AARs, with relationships between entities extracted from separate reports. As a result of this fusion, a security analyst can execute queries and retrieve better answers on the fused CKG, than a knowledge graph with no fusion. We also showcase various reasoning capabilities that can be leveraged by a security analyst using our fused CKG.
As AI systems become more ubiquitous, securing them becomes an emerging challenge. Over the years, with the surge in online social media use and the data available for analysis, AI systems have been built to extract, represent and use this information. The credibility of this information extracted from open sources, however, can often be questionable. Malicious or incorrect information can cause a loss of money, reputation, and resources; and in certain situations, pose a threat to human life. In this paper, we use an ensembled semi-supervised approach to determine the credibility of Reddit posts by estimating their reputation score to ensure the validity of information ingested by AI systems. We demonstrate our approach in the cybersecurity domain, where security analysts utilize these systems to determine possible threats by analyzing the data scattered on social media websites, forums, blogs, etc.
Security engineers and researchers use their disparate knowledge and discretion to identify malware present in a system. Sometimes, they may also use previously extracted knowledge and available Cyber Threat Intelligence (CTI) about known attacks to establish a pattern. To aid in this process, they need knowledge about malware behavior mapped to the available CTI. Such mappings enrich our representations and also helps verify the information. In this paper, we describe how we retrieve malware samples and execute them in a local system. The tracked malware behavior is represented in our Cybersecurity Knowledge Graph (CKG), so that a security professional can reason with behavioral information present in the graph and draw parallels with that information. We also merge the behavioral information with knowledge extracted from the text in CTI sources like technical reports and blogs about the same malware to improve the reasoning capabilities of our CKG significantly.
Cyber-defense systems are being developed to automatically ingest Cyber Threat Intelligence (CTI) that contains semi-structured data and/or text to populate knowledge graphs. A potential risk is that fake CTI can be generated and spread through Open-Source Intelligence (OSINT) communities or on the Web to effect a data poisoning attack on these systems. Adversaries can use fake CTI examples as training input to subvert cyber defense systems, forcing their models to learn incorrect inputs to serve the attackers' malicious needs.In this paper, we show how to automatically generate fake CTI text descriptions using transformers. Given an initial prompt sentence, a public language model like GPT-2 with fine-tuning can generate plausible CTI text that can mislead cyber-defense systems. We use the generated fake CTI text to perform a data poisoning attack on a Cybersecurity Knowledge Graph (CKG) and a cybersecurity corpus. The attack introduced adverse impacts such as returning incorrect reasoning outputs, representation poisoning, and corruption of other dependent AI-based cyber defense systems. We evaluate with traditional approaches and conduct a human evaluation study with cybersecurity professionals and threat hunters. Based on the study, professional threat hunters were equally likely to consider our fake generated CTI and authentic CTI as true.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.