Wide use of internet generates huge data which needs proper organization leading to text categorization. Earlier it was found that a document describes one category. Soon it was realized that it can describe multiple categories simultaneously. This scenario reveals the use of multi-label classification, a supervised learning approach, which assigns a predefined set of labels to an object by looking at its characteristics. Earlier used in text categorization, but soon it became the choice of researchers for wide applications like marketing, multimedia annotation, bioinformatics. Two most common approaches for multi-label classification are transformation which takes the benefit of existing single label classifiers preceded by converting multi-label data to single label, or an adaptation which designs classifiers which handle multi-label data directly. Another popular approach is ensemble of multiple classifiers taking votes of all. Other approaches are also available namely algorithm independent and algorithm dependent approach. Based on results produced, suitable metric is used for example or label wise evaluation which depends on whether prediction is binary or ranking. Every approach offers benefits and issues like loss of label dependency in transformation, complexity in case of adaptation, improvement in results using ensemble which should be considered during design of underlying application.
Abstract:One of the most significant threats faced by enterprise networks today is from Bots. A Bot is a program that operates as an agent for a user and runs automated tasks over the internet, at a much higher rate than would be possible for a human alone. A collection of Bots in a network, used for malicious purposes is referred to as a Botnet. Bot attacks can range from localized attacks like key-logging to network intensive attacks like Distributed Denial of Service (DDoS). In this paper, we suggest a novel approach that can detect and combat Bots. The proposed solution adopts a two pronged strategy which we have classified into the standalone algorithm and the network algorithm. The standalone algorithm runs independently on each node of the network. It monitors the active processes on the node and tries to identify Bot processes using parameters such as response time and output to input traffic ratio. If a suspicious process has been identified the network algorithm is triggered. The network algorithm will then analyze conversations to and from the hosts of the network using the transport layer flow records. It then tries to deduce the Bot pattern as well as Bot signatures which can subsequently be used by the standalone algorithm to thwart Bot processes at their very onset. Vineet Agarwal holds a B.Tech. in computer engineering from V.J.T.I and M.S. in Engineering Management from Santa Clara University. He has specialized in system analysis and design and his focus has been into implementing agile methodologies for software development.
In web information retrieval, the terms or keywords are used for indexing purpose of document. These terms or keywords appear in special location such as title, subtitle, header, hyperlinks and so on. Vector space model ignores the importance of these terms with respect to their position while calculating the weight of the indexing terms. The effectiveness of the vector space model crucially depends on the weights applied to the terms of the document vectors. These weights are found using a term weight evaluation scheme based on the frequency of the terms in the document and the collection. Terms that occur more often in a document are treated as more important whereas terms that occur less frequently throughout a collection are given a higher weight.In N-level Vector space approach, the importance of these terms with respect to their position is considered. The web document is logically divided in N-layer considering the structure of web document and weights are assigned to terms based on their presence in different layer within the document. Different weight evaluation schemes proposed for vector space models are applied to N-level vector space model and are compared. N-layer vector space model gives better result as compare to vector space model. Cosine similarity and all six weight evaluation methods that are formed using different local weights and global weights show that average precision and average recall in case of N-layer vector space model is always better than vector space model. General TermsWeb Information Retrieval, Web Mining Keywords N-layer vector space model, global weight, local weight, weight evaluation scheme.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.