“…In particular, we first describe: (1) the document collection used in the studies and for training clustering approaches, (2) the specific clustering approaches that we evaluate, (3) selection of the appropriate number of clusters in the collection. Sensitivity Collection: To train the clustering approaches we use a collection (GovSensitivity [16]) of 3801 government documents (502 sensitive) that are annotated at document-level and sentence-level by government sensitivity reviewers for two FOI sensitivities, i.e, "Personal Information" and "International Relations". In the user studies we use passages of the documents instead of the documents itself to reduce the complexity in reviewing large documents.…”