Cross-domain information exchange is necessary to obtain information superiority in the military domain, and should be based on assigning appropriate security labels to the information objects. Most of the data found in a defense network is unlabeled, and usually new unlabeled information is produced every day. Humans find that doing the security labeling of such information is labor-intensive and time consuming. At the same time there is an information explosion observed where more and more unlabeled information is generated year by year. This calls for tools that can do advanced content inspection, and automatically determine the security label of an information object correspondingly. This paper presents a machine learning approach to this problem. To the best of our knowledge, machine learning has hardly been analyzed for this problem, and the analysis on topical classification presented here provides new knowledge and a basis for further work within this area. Presented results are promising and demonstrates that machine learning can become a useful tool to assist humans in determining the appropriate security label of an information object.
(formerly UNIK) for jointly financing the work. I also wish to thank my trial defense opponents Edgar Lopez and Slobodan Petrovic for their feedback and Hallo Langweg at the COINS initiative for organizing it. Lastly, I wish to thank my family for their support.
Abstract-Cross-domain information exchange is an increasingly important capability for conducting efficient and secure operations, both within coalitions and within single nations. A data guard is a common cross-domain sharing solution that inspects and validates that the security labels of exported data objects are such that they can be released according to policy. While we see that guard solutions can be implemented with high assurance, we find that obtaining an equivalent level of assurance in the correctness of the security labels easily becomes a hard problem in practical scenarios. Thus, a weakness of the guardbased solution is that there is often limited assurance in the correctness of the security labels. To mitigate this, guards make use of content checkers such as dirty word lists as a means for detecting mislabeled data.To improve the overall security of such cross-domain solutions we investigate more advanced content checkers based on the use of machine learning. Instead of relying on manually specified dirty word lists, we can build data-driven methods that automatically infer the words associated with classified content. However, care must be taken when constructing and deploying these methods as naive implementations are vulnerable to manipulation attacks. In order to provide a better context for performing classification, we monitor the incoming information flow and use the audit trail to construct controlled environments. The usefulness of said deployment scheme is demonstrated using a real collection of classified and unclassified documents.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.