In this paper, we propose a new term weighting scheme called Term Frequency -Inverse Corpus Frequency (TF-ICF). It does not require term frequency information from other documents within the document collection and thus, it enables us to generate the document vectors of N streaming documents in linear time. In the context of a machine learning application, unsupervised document clustering, we evaluated the effectiveness of the proposed approach in comparison to five widely used term weighting schemes through extensive experimentation. Our results show that TF-ICF can produce document clusters that are of comparable quality as those generated by the widely recognized term weighting schemes and it is significantly faster than those methods.
Despite the best efforts of cyber security analysts, networked computing assets are routinely compromised, resulting in the loss of intellectual property, the disclosure of state secrets, and major financial damages. Anomaly detection methods are beneficial for detecting new types of attacks and abnormal network activity, but such algorithms can be difficult to understand and trust. Network operators and cyber analysts need fast and scalable tools to help identify suspicious behavior that bypasses automated security systems, but operators do not want another automated tool with algorithms they do not trust. Experts need tools to augment their own domain expertise and to provide a contextual understanding of suspicious behavior to help them make decisions. In this paper we present Situ, a visual analytics system for discovering suspicious behavior in streaming network data. Situ provides a scalable solution that combines anomaly detection with information visualization. The system's visualizations enable operators to identify and investigate the most anomalous events and IP addresses, and the tool provides context to help operators understand why they are anomalous. Finally, operators need tools that can be integrated into their workflow and with their existing tools. This paper describes the Situ platform and its deployment in an operational network setting. We discuss how operators are currently using the tool in a large organization's security operations center and present the results of expert reviews with professionals.
The tools to analyze and visualize information from multiple, heterogeneous sources have often relied on innovations in statistical methods. The results from purely statistical methods, however, overlook relevant semantic features present within natural language and text-based information. Emerging research in ontology languages (e.g. RDF, RDFS, SUO-KIF, and OWL) offers promising avenues for overcoming these limitations by leveraging existing and future libraries of meta-data and semantic mark-up. Using semantic features (e.g. hypernyms, meronyms, synonyms, etc.) encoded in ontology languages, methods such as keyword search and clustering can be augmented to analyze and visualize documents at conceptually richer levels. We present findings from a hierarchical clustering system modified for ontological indexing and run on a topic-centric test collection of documents each with fewer than 200 words. Our findings show that ontologies can impose a complete interpretation or subjective clustering onto a document set that is at least as good as meta-word search.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.