With the massive increase in the volume of digitally produced documents, government departments face a logistical issue when conducting the manual sensitivity review of documents that should be opened to the public. When reviewing a document, sensitivity reviewers often need to quickly access related information from other documents in the collection. For example, documents that mention the same topic or event can provide the reviewers with useful contextual information and assist the reviewers to make consistent sensitivity judgements more quickly. However, it is infeasible to manually identify groups of such related documents in large unstructured collections. In this work, we present a sensitivity review system that automatically identifies groups of related documents to assist reviewers and increase the efficiency of sensitivity review. In particular, our system groups the documents that are to be sensitivity reviewed based on the documents' semantic categories (e.g., criminality). Moreover, the system identifies chronological and coherent information threads to describe the full context of an event, activity or discussion that may be spread across multiple documents. Additionally, the system prioritises the identified semantic categories and information threads for review by leveraging automatic sensitivity classification to maximise the number of documents that can be opened to the public in a limited reviewing time-budget.
CCS CONCEPTS• Information systems → Clustering and classification.