Detecting inferences in documents is critical for ensuring privacy when sharing information. In this paper, we propose a refined and practical model of inference detection using a reference corpus. Our model is inspired by association rule mining: inferences are based on word co-occurrences. Using the model and taking the Web as the reference corpus, we can find inferences and measure their strength through web-mining algorithms that leverage search engines such as Google or Yahoo!.Our model also includes the important case of private corpora, to model inference detection in enterprise settings in which there is a large private document repository. We find inferences in private corpora by using analogues of our Web-mining algorithms, relying on an index for the corpus rather than a Web search engine.We present results from two experiments. The first experiment demonstrates the performance of our techniques in identifying all the keywords that allow for inference of a particular topic (e.g. "HIV") with confidence above a certain threshold. The second experiment uses the public Enron e-mail dataset. We postulate a sensitive topic and use the Enron corpus and the Web together to find inferences for the topic.These experiments demonstrate that our techniques are practical, and that our model of inference based on word co-occurrence is well-suited to efficient inference detection.
Abstract-The annual incidence of insider attacks continues to grow, and there are indications this trend will continue. While there are a number of existing tools that can accurately identify known attacks, these are reactive (as opposed to proactive) in their enforcement, and may be eluded by previously unseen, adversarial behaviors. This paper proposes an approach that combines Structural Anomaly Detection (SA) from social and information networks and Psychological Profiling (PP) of individuals. SA uses technologies including graph analysis, dynamic tracking, and machine learning to detect structural anomalies in large-scale information network data, while PP constructs dynamic psychological profiles from behavioral patterns. Threats are finally identified through a fusion and ranking of outcomes from SA and PP.The proposed approach is illustrated by applying it to a large data set from a massively multi-player online game, World of Warcraft (WoW). The data set contains behavior traces from over 350,000 characters observed over a period of 6 months. SA is used to predict if and when characters quit their guild (a player association with similarities to a club or workgroup in nongaming contexts), possibly causing damage to these social groups. PP serves to estimate the five-factor personality model for all characters. Both threads show good results on the gaming data set and thus validate the proposed approach.
The amount of contextual data collected, stored, mined, and shared is increasing exponentially. Street cameras, credit card transactions, chat and Twitter logs, e-mail, web site visits, phone logs and recordings, social networking sites, all are examples of data that persists in a manner not under individual control, leading some to declare the death of privacy. We argue here that the ability to generate convincing fake contextual data can be a basic tool in the fight to preserve privacy. One use for the technology is for an individual to make his actual data indistinguishable amongst a pile of false data.In this paper we consider two examples of contextual data, search engine query data and location data. We describe the current state of faking these types of data and our own efforts in this direction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.