Findings of the Association for Computational Linguistics: EMNLP 2021 2021
DOI: 10.18653/v1/2021.findings-emnlp.311
|View full text |Cite
|
Sign up to set email alerts
|

RelDiff: Enriching Knowledge Graph Relation Representations for Sensitivity Classification

Abstract: The relationships that exist between entities can be a reliable indicator for classifying sensitive information, such as commercially sensitive information. For example, the relation person-IsDirectorOf-company can indicate whether an individual's salary should be considered as sensitive personal information. Representations of such relations are often learned using a knowledge graph to produce embeddings for relation types, generalised across different entity-pairs. However, a relation type may or may not cor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 26 publications
0
3
0
Order By: Relevance
“…In particular, we first describe: (1) the document collection used in the studies and for training clustering approaches, (2) the specific clustering approaches that we evaluate, (3) selection of the appropriate number of clusters in the collection. Sensitivity Collection: To train the clustering approaches we use a collection (GovSensitivity [16]) of 3801 government documents (502 sensitive) that are annotated at document-level and sentence-level by government sensitivity reviewers for two FOI sensitivities, i.e, "Personal Information" and "International Relations". In the user studies we use passages of the documents instead of the documents itself to reduce the complexity in reviewing large documents.…”
Section: Preliminary Setupmentioning
confidence: 99%
See 2 more Smart Citations
“…In particular, we first describe: (1) the document collection used in the studies and for training clustering approaches, (2) the specific clustering approaches that we evaluate, (3) selection of the appropriate number of clusters in the collection. Sensitivity Collection: To train the clustering approaches we use a collection (GovSensitivity [16]) of 3801 government documents (502 sensitive) that are annotated at document-level and sentence-level by government sensitivity reviewers for two FOI sensitivities, i.e, "Personal Information" and "International Relations". In the user studies we use passages of the documents instead of the documents itself to reduce the complexity in reviewing large documents.…”
Section: Preliminary Setupmentioning
confidence: 99%
“…We deployed an SVM text classification approach as described in [13] to classify the documents as either sensitive or non-sensitive. To train the classifier, we used a 5-fold cross validation with stratified samples of the GovSensitivity collection as described in [16]. The effectiveness of the learned classifier was 0.733 BAC.…”
Section: User Study#2: Review Opennessmentioning
confidence: 99%
See 1 more Smart Citation