Automatic de-identification of textual documents in the electronic health record: a review of recent research

Meystre, Stéphane M.; Friedlin, F. Jeffrey; South, Brett R.; Shen, Shuying; Samore, Matthew H.

doi:10.1186/1471-2288-10-70

Cited by 225 publications

(209 citation statements)

References 15 publications

Supporting

Mentioning

198

Contrasting

Unclassified

Order By: Relevance

“…The majority of patients were in favor of data sharing among researchers, especially within the field of AD research, although support for the sharing of personal information decreased with a higher risk of re-identifying the data [47][48][49][50][51][52][53][54]. Trinidad and colleagues [55] also found that both current and prospective research participants are generally supportive of the sharing of non-identifiable data among researchers.…”

Section: Discussionmentioning

confidence: 99%

When Patient Engagement and Research Ethics Collide: Lessons from a Dementia Forum

Robillard

Feng

2017

JAD

View full text Add to dashboard Cite

Abstract. The importance of patient engagement in research has been gaining recognition since the turn of the 21st century. However, little is known about the perspectives of people with dementia on the process of discovery. To fill this gap and to inform priorities in patient engagement in the context of dementia research, the Clinic for Alzheimer Disease and Related Disorders at the University of British Columbia hosted an interactive session for members of the patient community and of the general public to share their views on various ethical aspects of the research process. Results from the session indicate that several current research ethics policies and norms in dementia research are not in line with participants' preferences. Here we discuss the importance of bridging the gap between researchers and patients and call for reforms in current standards of dementia research.

show abstract

Section: Discussionmentioning

confidence: 99%

When Patient Engagement and Research Ethics Collide: Lessons from a Dementia Forum

Robillard

Feng

2017

JAD

View full text Add to dashboard Cite

show abstract

“…An overview of approaches to PHI de-identification is provided by Meystre et al [8]. From their analysis, they concluded that methods based on linguistic resources, such as dictionaries, tend to perform better with rarely mentioned PHIs.…”

Section: De-identificationmentioning

confidence: 99%

De-identification of health records using Anonym: Effectiveness and robustness across datasets

Zuccon¹,

Kotzur²,

Nguyen³

et al. 2014

Artificial Intelligence in Medicine

View full text Add to dashboard Cite

Results: Anonym identifies and removes up to 96.6% of personal health identifiers (recall) with a precision of up to 98.2% on the i2b2 dataset, outperforming the best system proposed in the i2b2 challenge. The effectiveness of Anonym across datasets is found to depend on the amount of information available for training.Conclusion: Findings show that Anonym compares to the best approach from the 2006 i2b2 shared task. It is easy to retrain Anonym with new datasets; if retrained, the system is robust to variations of training size, data type and quality in presence of sufficient training data.

show abstract

“…Later the need to sanitize the unstructured documents had come into notice. This need is revealed in initiatives from DARPA [11] or the Consortium for Healthcare Informatics Research (CHIR) [12] which aim at building new methods and tools for declassification of confidential documents. In the structured documents the structure itself provides the key to identify sensitive terms.…”

Section: Related Workmentioning

confidence: 99%

Automatic Declassification of Textual Documents by Generalizing Sensitive Terms

Vasudevan¹,

John²

2014

IJCA

View full text Add to dashboard Cite

With the advent of internet, large numbers of text documents are published and shared every day .Each of these documents is a collection of vast amount of information. Publically sharing of some of this information may affect the privacy of the document, if they are confidential information. So before document publishing, sanitization operations are performed on the document for preserving the privacy and inorder to retain the utility of the document. Various schemes were developed to solve this problem but most of them turned out to be domain specific and most of them didn't consider the presence of semantically correlated terms. This paper presents a generalized sanitization method that discovers the sensitive information based on the concept of information content. The proposed method removes the confidential information from the text document by first finding the independent sensitive terms. Then with the use of these sensitive terms the correlated terms that cause a disclosure threat are discovered. Again with the help of a generalization algorithm these sensitive and correlated terms with high disclosure risk are generalized.

show abstract

Automatic de-identification of textual documents in the electronic health record: a review of recent research

Cited by 225 publications

References 15 publications

When Patient Engagement and Research Ethics Collide: Lessons from a Dementia Forum

When Patient Engagement and Research Ethics Collide: Lessons from a Dementia Forum

De-identification of health records using Anonym: Effectiveness and robustness across datasets

Automatic Declassification of Textual Documents by Generalizing Sensitive Terms

Contact Info

Product

Resources

About