The Role of Latent Semantic Categories and Clustering in Enhancing the Efficiency of Human Sensitivity Review

Narvala, Hitarth; McDonald, Graham; Ounis, Iadh

doi:10.1145/3498366.3505824

Cited by 5 publications

(6 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In our previous work [9], we conducted two user studies using the system that we present in this work. In the user studies, we evaluated the functionalities of our system for efficient sensitivity reviews.…”

Section: Discussionmentioning

confidence: 99%

“…For example, the details of an employee's salary are more likely to be sensitive in documents about business discussions than mentions of salaries in documents about political discussions, since politicians' salaries are usually in the public domain. Prioritising particular groups of related documents for review can also help to increase the number of documents that can be opened to the public when there are limited reviewing resources [9] (i.e., openness [6]). However, in large unstructured document collections, it is not practical for reviewers to manually identify such groups of related documents.…”

Section: Document Collec�on Document Collec�onmentioning

confidence: 99%

“…• Data Layer: The data layer manages the storage of the document collection and the associated document metadata attributes. For this demo, we use the GovSensitivity [9] collection, which comprises government documents that are annotated for FOI sensitivities. The data layer further records the outputs from semantic categorisation, information threading and sensitivity classification along with the reviewers' sensitivity judgements.…”

Section: System Architecturementioning

confidence: 99%

“…We identify semantic categories using document clustering, based on our previous work [9]. In particular, we deploy DEC [13], which is a popular deep neural clustering approach that simultaneously learns feature representation and clustering assignments.…”

Section: System Architecturementioning

confidence: 99%

“…As briefly discussed in Section 1, the sequential review of documents that belong to a semantic category can facilitate the understanding of associated sensitivities in the category to reviewers. Therefore, this sequential review of semantically related documents can improve the reviewers' reviewing speed and assist them in providing consistent judgements for related documents [9]. As shown in Figure 4, the reviewers are presented with the identified semantic categories prioritised by their predicted sensitivity probability.…”

Section: Key Functionalitiesmentioning

confidence: 99%

See 4 more Smart Citations

Sensitivity Review of Large Collections by Identifying and Prioritising Coherent Documents Groups

Narvala

McDonald

Ounis

2022

Proceedings of the 31st ACM International Conference on Information &Amp; Knowledge Management

Self Cite

View full text Add to dashboard Cite

With the massive increase in the volume of digitally produced documents, government departments face a logistical issue when conducting the manual sensitivity review of documents that should be opened to the public. When reviewing a document, sensitivity reviewers often need to quickly access related information from other documents in the collection. For example, documents that mention the same topic or event can provide the reviewers with useful contextual information and assist the reviewers to make consistent sensitivity judgements more quickly. However, it is infeasible to manually identify groups of such related documents in large unstructured collections. In this work, we present a sensitivity review system that automatically identifies groups of related documents to assist reviewers and increase the efficiency of sensitivity review. In particular, our system groups the documents that are to be sensitivity reviewed based on the documents' semantic categories (e.g., criminality). Moreover, the system identifies chronological and coherent information threads to describe the full context of an event, activity or discussion that may be spread across multiple documents. Additionally, the system prioritises the identified semantic categories and information threads for review by leveraging automatic sensitivity classification to maximise the number of documents that can be opened to the public in a limited reviewing time-budget. CCS CONCEPTS• Information systems → Clustering and classification.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Document Collec�on Document Collec�onmentioning

confidence: 99%

Section: System Architecturementioning

confidence: 99%

Section: System Architecturementioning

confidence: 99%

Section: Key Functionalitiesmentioning

confidence: 99%

See 3 more Smart Citations

Sensitivity Review of Large Collections by Identifying and Prioritising Coherent Documents Groups

Narvala

McDonald

Ounis

2022

Proceedings of the 31st ACM International Conference on Information &Amp; Knowledge Management

Self Cite

View full text Add to dashboard Cite

show abstract

Displaying Evolving Events Via Hierarchical Information Threads for Sensitivity Review

Narvala,

McDonald,

Ounis

2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Decision support for detecting sensitive text in government records

Branting,

Brown,

Giannella

et al. 2023

Artif Intell Law

View full text Add to dashboard Cite

Freedom of information laws promote transparency by permitting individuals and organizations to obtain government documents. However, exemptions from disclosure are necessary to protect privacy and to permit government officials to deliberate freely. Deliberative language is often the most challenging and burdensome exemption to detect, leading to high processing costs and delays in responding to open-records requests. This paper describes a novel deliberative-language detection model trained on a new annotated training set. The deliberative-language detection model is a component of a decision-support system for open-records requests under the US Freedom of Information Act, the FOIA Assistant, that ingests documents responsive to an open-records requests, suggests passages likely to be subject to deliberative language, privacy, or other exemptions, and assists analysts in rapidly redacting suggested passages. The tool’s interface is based on extensive human-factors and usability studies with analysts and is currently in operational testing by multiple US federal agencies.

show abstract

The Role of Latent Semantic Categories and Clustering in Enhancing the Efficiency of Human Sensitivity Review

Abstract: There may be differences between this version and the published version. You are advised to consult the publisher's version if you wish to cite from it.

Cited by 5 publications

References 23 publications

Sensitivity Review of Large Collections by Identifying and Prioritising Coherent Documents Groups

Sensitivity Review of Large Collections by Identifying and Prioritising Coherent Documents Groups

Displaying Evolving Events Via Hierarchical Information Threads for Sensitivity Review

Decision support for detecting sensitive text in government records

Contact Info

Product

Resources

About