With the emerging technologies and all associated devices, it is predicted that massive amount of data will be created in the next few yearsin fact, as much as 90% of current data were created in the last couple of yearsa trend that will continue for the foreseeable future. Sustainable computing studies the process by which computer engineer/scientist designs computers and associated subsystems efficiently and effectively with minimal impact on the environment. However, current intelligent machine-learning systems are performance driventhe focus is on the predictive/classification accuracy, based on known properties learned from the training samples. For instance, most machine-learning-based nonparametric models are known to require high computational cost in order to find the global optima. With the learning task in a large dataset, the number of hidden nodes within the network will therefore increase significantly, which eventually leads to an exponential rise in computational complexity. This paper thus reviews the theoretical and experimental data-modeling literature, in large-scale data-intensive fields, relating to: (1) model efficiency, including computational requirements in learning, and data-intensive areas' structure and design, and introduces (2) new algorithmic approaches with the least memory requirements and processing to minimize computational cost, while maintaining/improving its predictive/classification accuracy and stability.
An ever-increasing number of computing devices interconnected through wireless networks encapsulated in the cyber-physical-social systems and a significant amount of sensitive network data transmitted among them have raised security and privacy concerns. Intrusion detection system (IDS) is known as an effective defence mechanism and most recently machine learning (ML) methods are used for its development. However, Internet of Things (IoT) devices often have limited computational resources such as limited energy source, computational power and memory, thus, traditional ML-based IDS that require extensive computational resources are not suitable for running on such devices. This study thus is to design and develop a lightweight ML-based IDS tailored for the resource-constrained devices. Specifically, the study proposes a lightweight ML-based IDS model namely IMPACT (IMPersonation Attack deteCTion using deep auto-encoder and feature abstraction). This is based on deep feature learning with gradient-based linear Support Vector Machine (SVM) to deploy and run on resource-constrained devices by reducing the number of features through feature extraction and selection using a stacked autoencoder (SAE), mutual information (MI) and C4.8 wrapper. The IMPACT is trained on Aegean Wi-Fi Intrusion Dataset (AWID) to detect impersonation attack. Numerical results show that the proposed IMPACT achieved 98.22% accuracy with 97.64% detection rate and 1.20% false alarm rate and outperformed existing state-of-the-art benchmark models. Another key contribution of this study is the investigation of the features in AWID dataset for its usability for further development of IDS.
Abstract-Members of a criminal organization, who hold central positions in the organization, are usually targeted by criminal investigators for removal or surveillance. This is because they play key and influential roles by acting as commanders who issue instructions or serve as gatekeepers. Removing these central members (i.e., influential members) is most likely to disrupt the organization and put it out of business. Most often, criminal investigators are even more interested in knowing the portion of these influential members, who are the immediate leaders of lower-level criminals. These lower-level criminals are the ones who usually carry out the criminal works; therefore, they are easier to identify. The ultimate goal of investigators is to identify the immediate leaders of these lower-level criminals in order to disrupt future crimes. We propose in this paper a forensic analysis system called SIIMCO that can identify the influential members of a criminal organization. Given a list of lower-level criminals in a criminal organization, SIIMCO can also identify the immediate leaders of these criminals. SIIMCO first constructs a network representing a criminal organization from either Mobile Communication Data that belongs to the organization or from crime incident reports. It adopts the concept space approach to automatically construct a network from crime incident reports. In such a network, a vertex represents an individual criminal and a link represents the relationship between two criminals. SIIMCO employs formulas that quantify the degree of influence/importance of each vertex in the network relative to all other vertices. We present these formulas through a series of refinements. All the formulas incorporate novel-weighting schemes for the edges of networks. We evaluated the quality of SIIMCO by comparing it experimentally with two other systems. Results showed marked improvement.
Botnets, which consist of remotely controlled compromised machines called bots, provide a distributed platform for several threats against cyber world entities and enterprises. Intrusion detection system (IDS) provides an efficient countermeasure against botnets. It continually monitors and analyzes network traffic for potential vulnerabilities and possible existence of active attacks. A payload-inspection-based IDS (PI-IDS) identifies active intrusion attempts by inspecting transmission control protocol and user datagram protocol packet's payload and comparing it with previously seen attacks signatures. However, the PI-IDS abilities to detect intrusions might be incapacitated by packet encryption. Traffic-based IDS (T-IDS) alleviates the shortcomings of PI-IDS, as it does not inspect packet payload; however, it analyzes packet header to identify intrusions. As the network's traffic grows rapidly, not only the detection-rate is critical, but also the efficiency and the scalability of IDS become more significant. In this paper, we propose a state-of-the-art T-IDS built on a novel randomized data partitioned learning model (RDPLM), relying on a compact network feature set and feature selection techniques, simplified subspacing and a multiple randomized meta-learning technique. The proposed model has achieved 99.984% accuracy and 21.38 s training time on a well-known benchmark botnet dataset. Experiment results demonstrate that the proposed methodology outperforms other well-known machine-learning models used in the same detection task, namely, sequential minimal optimization, deep neural network, C4.5, reduced error pruning tree, and randomTree.
We introduce a forensic analysis system called ECLfinder that identifies the influential members of a criminal organization as well as the immediate leaders of a given list of lower-level criminals. Criminal investigators usually seek to identify the influential members of criminal organizations, because eliminating them is most likely to hinder and disrupt the operations of these organizations and put them out of business. First, ECLfinder constructs a network representing a criminal organization from either Mobile Communication Data associated with the organization or crime incident reports that include information about the organization. It then constructs a Minimum Spanning Tree (MST) of the network. It identifies the influential members of a criminal organization by determining the important vertices in the network representing the organization, using the concept of existence dependency. Each vertex v is assigned a score, which is the number of other vertices, whose existence in MST is dependent on v. Vertices are ranked based on their scores. Criminals represented by the top ranked vertices are considered the influential members of the criminal organization represented by the network. We evaluated the quality of ECLfinder by comparing it experimentally with three other systems. Results showed marked improvement.
BackgroundUnderstanding the genetic networks and their role in chronic diseases (e.g., cancer) is one of the important objectives of biological researchers. In this work, we present a text mining system that constructs a gene-gene-interaction network for the entire human genome and then performs network analysis to identify disease-related genes. We recognize the interacting genes based on their co-occurrence frequency within the biomedical literature and by employing linear and non-linear rare-event classification models. We analyze the constructed network of genes by using different network centrality measures to decide on the importance of each gene. Specifically, we apply betweenness, closeness, eigenvector, and degree centrality metrics to rank the central genes of the network and to identify possible cancer-related genes.ResultsWe evaluated the top 15 ranked genes for different cancer types (i.e., Prostate, Breast, and Lung Cancer). The average precisions for identifying breast, prostate, and lung cancer genes vary between 80-100%. On a prostate case study, the system predicted an average of 80% prostate-related genes.ConclusionsThe results show that our system has the potential for improving the prediction accuracy of identifying gene-gene interaction and disease-gene associations. We also conduct a prostate cancer case study by using the threshold property in logistic regression, and we compare our approach with some of the state-of-the-art methods.Electronic supplementary materialThe online version of this article (10.1186/s12859-019-2634-7) contains supplementary material, which is available to authorized users.
In this paper we propose a forensic analysis system called CISRI that helps forensic investigators determine the most influential members of a criminal group, who are related to known members of the group, for the purposes of investigation. In the CISRI framework, we describe the structural relationships between the members of a criminal group in terms of a graph. In such a graph, a node represents a member of a criminal group, an edge connecting two nodes represents the relationship between two members of the group, and the weight of an edge represents the degree of the relationship between those two members. Using this representation, we propose a method that determines the relative importance of nodes in a graph with respect to a given set of query nodes. Most current approaches that study relative importance determine the relative importance of a node under consideration by estimating the contribution of each query node individually to the importance of this node while overlooking the contribution of the query nodes collectively to the importance of the node under consideration. This may lead to results with low precision. CISRI overcomes this limitation by: (1) computing the contribution of the overall set of query nodes to the importance of a node under consideration, and (2) adopting a tight constraint calculation that considers how much each query node contributes to the relative importance of a node under consideration. This leads to accurate identification of nodes in the graph that are important, in relation to the query nodes. In the framework of CISRI, a graph is constructed from mobile communication records (e.g., phone calls and messages), where a node represents a caller and the weight of an edge reflects the number of contacts between two callers. We evaluated the quality of CISRI by comparing it experimentally with three comparable methods. Our results showed marked improvement.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.