1-Social media platforms are commonly employed by law enforcement agencies for collecting Open Source Intelligence (OSNIT) on criminals, and assessing the risk they pose to the environment the live in. However, since no prior research has investigated the relationships between hackers' use of social media platforms and their likelihood to generate cyberattacks, this practice is less common among Information Technology Teams. Addressing this empirical gap, we draw on the social learning theory and estimate the relationships between hackers' use of Facebook, Twitter, and YouTube and the frequency of web defacement attacks they generate in different times (weekdays vs. weekends) and against different targets (USA vs. non-USA websites). To answer our research questions, we use hackers' reports of web defacement they generated (available on http://www.zone-h.org), and complement with an independent data collection we launched to identify these hackers' use of different social media platforms. Results from a series of Negative Binomial Regression analyses reveal that hackers' use of social media platforms, and specifically Twitter and Facebook, significantly increases the frequency of web defacement attacks they generate. However, while using these social media platforms significantly increases the volume of web defacement attacks these hackers generate during weekdays, it has no association with the volume of web defacement they launch over weekends. Finally, although hackers' use of both Facebook and Twitter accounts increase the frequency of attacks they generate against non-USA websites, the use of Twitter only increases significantly the volume of web defacement attacks against USA websites.
A distributed memory parallel version of the group average hierarchical agglomerative clustering algorithm is proposed to enable scaling the document clustering problem to large collections. Using standard message passing operations reduces interprocess communication while maintaining efficient load balancing. In a series of experiments using a subset of a standard Text REtrieval Conference (TREC) test collection, our parallel hierarchical clustering algorithm is shown to be scalable in terms of processors efficiently used and the collection size. Results show that our algorithm performs close to the expected O(n 2 /p) time on p processors rather than the worst-case O(n 3 /p) time. Furthermore, the O(n 2 /p) memory complexity per node allows larger collections to be clustered as the number of nodes increases. While partitioning algorithms such as k-means are trivially parallelizable, our results confirm those of other studies which showed that hierarchical algorithms produce significantly tighter clusters in the document clustering task. Finally, we show how our parallel hierarchical agglomerative clustering algorithm can be used as the clustering subroutine for a parallel version of the buckshot algorithm to cluster the complete TREC collection at near theoretical runtime expectations. IntroductionDocument clustering has long been considered as a means to potentially improve both retrieval effectiveness and efficiency; however, the intensive computation necessary to cluster the entire collection makes its application to large datasets difficult. Accordingly, there is little work on effectively clustering entire large, standard-text collections and less with the intent of using these clusterings to aid retrieval. Rather, much work has focused on either performing simplified clustering algorithms or only using partial clusterings such as clustering only the results for a given query.Clustering algorithms generally consist of a trade-off between accuracy and speed. Hierarchical agglomerative clustering algorithms calculate a full document-to-document similarity matrix. Their clusterings are typically viewed as more accurate than other types of clusterings; however, the computational complexity required for the algorithm's quadratic behavior makes it unrealistic for large document collections. Other clustering algorithms such as the k-means and single pass algorithms iteratively partition the data into clusters. Although these partitioning algorithms run in linear time, the assignment of documents to moving centroids produces different clusterings with each run. Some algorithms combine the accuracy of hierarchical agglomerative algorithms with the speed of partitioning algorithms to get an algorithm that is fast with reasonable accuracy. One such algorithm is the buckshot algorithm, which uses a hierarchical agglomerative algorithm as a clustering subroutine.We propose a hierarchical agglomerative clustering algorithm designed for a distributed memory system in which we use the message passing model to facilitate in...
Abstract-We present a unified linear program formulation for optimal content delivery in content delivery networks (CDNs), taking into account various costs and constraints associated with content dissemination from the origin server to storage nodes, data storage, and the eventual fetching of content from storage nodes by end users. Our formulation can be used to achieve a variety of performance goals and system behavior, including the bounding of fetch delay, load balancing, and robustness against node and arc failures. Simulation results suggest that our formulation performs significantly better than the traditional minimum k-median formulation for the delivery of multiple content, even under modest circumstances (small network, few objects, low storage budget, low dissemination costs).
1-Cyber security experts in the U.S. and around the globe assess potential threats to their organizations by evaluating potential attackers' skills, knowledge, resources, access to the target organization and motivation to offend (i.e. SKRAM). Unfortunately, this model fails to incorporate insights regarding online offenders' traits and the conditions surrounding the development of online criminal event. Drawing on contemporary criminological models, we present a theoretical rationale for revising the SKRAM model. The revised model suggests that in addition to the classical SKRAM components, both individual attributes and certain offline and online circumstances fuel cyber attackers' motivation to offend, and increase the probability that a cyber-attack will be launched against an organization. Consistent with our proposed model, and its potential in predicting the occurrence of different types of cyber-dependent crimes against organizations, we propose that Information Technology professionals' efforts to facilitate safe computing environments should design new approaches for collecting indicators regarding attackers' potential threat, and predicting the occurrence and timing of cyber-dependent crimes.
XML is a flexible and powerful tool that enables information and security sharing in heterogeneous environments. Scalable technologies are needed to effectively manage the growing volumes of XML data. A wide variety of methods exist for storing and searching XML data; the two most common techniques are conventional tree-based and relational approaches. Tree-based approaches represent XML as a tree and use indexes and path join algorithms to process queries. In contrast, the relational approach utilizes the power of a mature relational database to store and search XML. This method relationally maps XML queries to SQL and reconstructs the XML from the database results. To date, the limited acceptance of the relational approach to XML processing is due to the need to redesign the relational schema each time a new XML hierarchy is defined. We, in contrast, describe a relational approach that is fixed schema eliminating the need for schema redesign at the expense of potentially longer runtimes. We show, however, that these potentially longer runtimes are still significantly shorter than those of the tree approach.Using a relational database for scalable XML search 147 We use a popular XML benchmark to compare the scalability of both approaches. We generated large collections of heterogeneous XML documents ranging in size from 500 MB to 8 GB using the XBench benchmark. The scalability of each method was measured by running XML queries that cover a wide range of XML search features on each collection. We measure the scalability of each method over different query features as the collection size increases. In addition, we examine the performance of each method as the result size and the number of predicates increase. Our results show that our relational approach provides a scalable approach to XML retrieval by leveraging existing relational database optimizations. Furthermore, we show that the relational approach typically outperforms the tree-based approach while scaling consistently over all collections studied.
The growing trend of using XML to share security In Fig. 1, we show the total query time for each query. Also data requires scalable technology to effectively manage the shown are the mean and total times for each run. All timings volume and variety of data. Although a wide variety of methods given represent the average execution time of the queries (in exist for storing and searching XML, the two most common gvn rere the average exeut e of the q eries techniques are conventional tree-based approaches and relational random order) over five runs. To ensure a cold cache, the server approaches. Tree-based approaches represent XML as a tree was rebooted between runs. Overall, the relational approach and use indexes and path join algorithms to process queries. outperformed the tree-based approach on all five collections. In contrast, the relational approach seeks to utilize the power However, the tree based approach outperformed the relational of a mature relational database to store and search XML. This approach for both quantifier queries. On average, the relational method relationally maps XML queries to SQL and reconstructs approach took 17.5 times as long to execute the 8GB queries the XML from the database results.We use the XBench benchmark to compare the scalability of than the 500MB queries. This is very close to the expected the SQLGenerator, our relational approach, with eXist, a popular linear scaling factor of 16. On the other hand, the tree based tree-based approach.approach took 55.6 times as long to execute the queries on I. INTRODUCTION the 8GB collection. XML is a flexible and powerful tool that enables vital Collection Size security sharing in heterogeneous environments [1]. Since 500MB 1GB 2GB 4GB 8GB tree rel tree rel tree rel tree rel tree rel XML can be extended to include domain specific tags, inQl 15.52 0.43 26.75 0.44 59.41 0.51 131.11 0.71 484.24 0.67 formation can be encoded with meaningful structure and Q5 t4.52 0.77 25.72 0.78 56.20 0.96 129.61 1.27 498.83 1.47 Q6 2.03 31.08 4.64 50.26 t8.97 52.37 28.73 183.10 TIMEOUT 466.58 semantics that allow rapid information sharing among devices Q7 2.85 25.0 6.43 38.88 10.32 71.70 26.94 243.15 217.72 594.19 and organizations. We examine the conventional tree approach Q8 14.57 0.58 25.89 0.51 56.52 0.69 131.00 0.87 478.43 1.10 Q9 14.73 0.40 25.38 0.41 53.50 0.46 124.22 0.63 426.58 0.72 and the relational mapping of XML queries to determine which QIO 16.22 1.16 29.89 1.51 68.35 2.73 139.53 5.97 1,850.99 13.22 method has the potential to search large collections of XML 100.71 projects/xbench/index.html) to create a heterogeneous total 158.28 63.09 278.09 96.94 591.7 151.88 1741.33 477.44 8001.86 1107.84collection of multiple schema data-centric XML documents. We generated an 8GB collection from the modified XBench templates. The 500MB, 1GB, 2GB, and 4GB collections were created from random subsets of the 1GB, 2GB, 4GB, From the timing results presented, we conclude that the and 8GB collections respectively. relational approach is highly scalable to i...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.