Rebecca Cathey scite author profile

1-Social media platforms are commonly employed by law enforcement agencies for collecting Open Source Intelligence (OSNIT) on criminals, and assessing the risk they pose to the environment the live in. However, since no prior research has investigated the relationships between hackers' use of social media platforms and their likelihood to generate cyberattacks, this practice is less common among Information Technology Teams. Addressing this empirical gap, we draw on the social learning theory and estimate the relationships between hackers' use of Facebook, Twitter, and YouTube and the frequency of web defacement attacks they generate in different times (weekdays vs. weekends) and against different targets (USA vs. non-USA websites). To answer our research questions, we use hackers' reports of web defacement they generated (available on http://www.zone-h.org), and complement with an independent data collection we launched to identify these hackers' use of different social media platforms. Results from a series of Negative Binomial Regression analyses reveal that hackers' use of social media platforms, and specifically Twitter and Facebook, significantly increases the frequency of web defacement attacks they generate. However, while using these social media platforms significantly increases the volume of web defacement attacks these hackers generate during weekdays, it has no association with the volume of web defacement they launch over weekends. Finally, although hackers' use of both Facebook and Twitter accounts increase the frequency of attacks they generate against non-USA websites, the use of Twitter only increases significantly the volume of web defacement attacks against USA websites.

Exploiting parallelism to support scalable hierarchical clustering

Jensen

Beitzel

2007

J. Am. Soc. Inf. Sci.

A distributed memory parallel version of the group average hierarchical agglomerative clustering algorithm is proposed to enable scaling the document clustering problem to large collections. Using standard message passing operations reduces interprocess communication while maintaining efficient load balancing. In a series of experiments using a subset of a standard Text REtrieval Conference (TREC) test collection, our parallel hierarchical clustering algorithm is shown to be scalable in terms of processors efficiently used and the collection size. Results show that our algorithm performs close to the expected O(n 2 /p) time on p processors rather than the worst-case O(n 3 /p) time. Furthermore, the O(n 2 /p) memory complexity per node allows larger collections to be clustered as the number of nodes increases. While partitioning algorithms such as k-means are trivially parallelizable, our results confirm those of other studies which showed that hierarchical algorithms produce significantly tighter clusters in the document clustering task. Finally, we show how our parallel hierarchical agglomerative clustering algorithm can be used as the clustering subroutine for a parallel version of the buckshot algorithm to cluster the complete TREC collection at near theoretical runtime expectations. IntroductionDocument clustering has long been considered as a means to potentially improve both retrieval effectiveness and efficiency; however, the intensive computation necessary to cluster the entire collection makes its application to large datasets difficult. Accordingly, there is little work on effectively clustering entire large, standard-text collections and less with the intent of using these clusterings to aid retrieval. Rather, much work has focused on either performing simplified clustering algorithms or only using partial clusterings such as clustering only the results for a given query.Clustering algorithms generally consist of a trade-off between accuracy and speed. Hierarchical agglomerative clustering algorithms calculate a full document-to-document similarity matrix. Their clusterings are typically viewed as more accurate than other types of clusterings; however, the computational complexity required for the algorithm's quadratic behavior makes it unrealistic for large document collections. Other clustering algorithms such as the k-means and single pass algorithms iteratively partition the data into clusters. Although these partitioning algorithms run in linear time, the assignment of documents to moving centroids produces different clusterings with each run. Some algorithms combine the accuracy of hierarchical agglomerative algorithms with the speed of partitioning algorithms to get an algorithm that is fast with reasonable accuracy. One such algorithm is the buckshot algorithm, which uses a hierarchical agglomerative algorithm as a clustering subroutine.We propose a hierarchical agglomerative clustering algorithm designed for a distributed memory system in which we use the message passing model to facilitate in...

Misuse detection for information retrieval systems

Cathey¹,

Li²,

Goharian³

et al. 2003

Optimal content delivery with network coding

Leong

2009

Abstract-We present a unified linear program formulation for optimal content delivery in content delivery networks (CDNs), taking into account various costs and constraints associated with content dissemination from the origin server to storage nodes, data storage, and the eventual fetching of content from storage nodes by end users. Our formulation can be used to achieve a variety of performance goals and system behavior, including the bounding of fetch delay, load balancing, and robustness against node and arc failures. Simulation results suggest that our formulation performs significantly better than the traditional minimum k-median formulation for the delivery of multiple content, even under modest circumstances (small network, few objects, low storage budget, low dissemination costs).

Detection of hacking behaviors and communication patterns on social media

Babko-Malaya

Hinton³

et al. 2017

Re-thinking Online Offenders’ SKRAM: Individual Traits and Situational Motivations as Additional Risk Factors for Predicting Cyber Attacks

Maimon

Babko-Malaya

et al. 2017

1-Cyber security experts in the U.S. and around the globe assess potential threats to their organizations by evaluating potential attackers' skills, knowledge, resources, access to the target organization and motivation to offend (i.e. SKRAM). Unfortunately, this model fails to incorporate insights regarding online offenders' traits and the conditions surrounding the development of online criminal event. Drawing on contemporary criminological models, we present a theoretical rationale for revising the SKRAM model. The revised model suggests that in addition to the classical SKRAM components, both individual attributes and certain offline and online circumstances fuel cyber attackers' motivation to offend, and increase the probability that a cyber-attack will be launched against an organization. Consistent with our proposed model, and its potential in predicting the occurrence of different types of cyber-dependent crimes against organizations, we propose that Information Technology professionals' efforts to facilitate safe computing environments should design new approaches for collecting indicators regarding attackers' potential threat, and predicting the occurrence and timing of cyber-dependent crimes.

Using a relational database for scalable XML search

2007

XML is a flexible and powerful tool that enables information and security sharing in heterogeneous environments. Scalable technologies are needed to effectively manage the growing volumes of XML data. A wide variety of methods exist for storing and searching XML data; the two most common techniques are conventional tree-based and relational approaches. Tree-based approaches represent XML as a tree and use indexes and path join algorithms to process queries. In contrast, the relational approach utilizes the power of a mature relational database to store and search XML. This method relationally maps XML queries to SQL and reconstructs the XML from the database results. To date, the limited acceptance of the relational approach to XML processing is due to the need to redesign the relational schema each time a new XML hierarchy is defined. We, in contrast, describe a relational approach that is fixed schema eliminating the need for schema redesign at the expense of potentially longer runtimes. We show, however, that these potentially longer runtimes are still significantly shorter than those of the tree approach.Using a relational database for scalable XML search 147 We use a popular XML benchmark to compare the scalability of both approaches. We generated large collections of heterogeneous XML documents ranging in size from 500 MB to 8 GB using the XBench benchmark. The scalability of each method was measured by running XML queries that cover a wide range of XML search features on each collection. We measure the scalability of each method over different query features as the collection size increases. In addition, we examine the performance of each method as the result size and the number of predicates increase. Our results show that our relational approach provides a scalable approach to XML retrieval by leveraging existing relational database optimizations. Furthermore, we show that the relational approach typically outperforms the tree-based approach while scaling consistently over all collections studied.

Misuse detection for information retrieval systems

Goharian

2003

We present a novel approach to detect misuse within an information retrieval system by gathering and maintaining knowledge of the behavior of the user rather than anticipating attacks by unknown assailants. Our approach is based on building and maintaining a profile of the behavior of the system user through tracking, or monitoring of user activity within the information retrieval system. Any new activity of the user is compared to the user profile to detect a potential misuse for the authorized user. We propose four different methods to detect misuse in information retrieval systems. Our experimental results on 2 GB collection favorably demonstrate the validity of our approach.