You-Qian Lee scite author profile

You-Qian Lee

2Publications

8Citation Statements Received

17Citation Statements Given

How they've been cited

How they cite others

Affiliations

National University of Kaohsiung

Publications

Order By: Most citations

Protected Health Information Recognition of Unstructured Code-Mixed Electronic Health Records in Taiwan

Lee¹,

Wang²,

et al. 2022

View full text Add to dashboard Cite

Electronic health records (EHRs) at medical institutions provide valuable sources for research in both clinical and biomedical domains. However, before such records can be used for research purposes, protected health information (PHI) mentioned in the unstructured text must be removed. In Taiwan’s EHR systems the unstructured EHR texts are usually represented in the mixing of English and Chinese languages, which brings challenges for de-identification. This paper presented the first study, to the best of our knowledge, of the construction of a code-mixed EHR de-identification corpus and the evaluation of different mature entity recognition methods applied for the code-mixed PHI recognition task.

show abstract

Principle-Based Approach for the De-Identification of Code-Mixed Electronic Health Records

Wang

Lee

et al. 2022

IEEE Access

View full text Add to dashboard Cite

Code-mixing is a phenomenon when at least two languages combined in a hybrid way in the context of a single conversation. The use of mixed language is widespread in multilingual and multicultural countries and poses significant challenges for the development of automated language processing tools. In Taiwan's electronic health record (EHR) systems, the unstructured EHR texts are usually represented in the mixing of English and Chinese languages resulting in the difficulty for de-identification and synthetization of protected health information (PHI). We explored this problem by applied several state-of-the-art pretrained mono-and multilingual language models and proposed to apply the principle-based approach (PBA) for the tasks of PHI recognition and resynthesis on a code-mixed EHR corpus, which was annotated with 6 main categories and 25 subcategories of PHIs. In PBA, a hierarchical principle slot schema is defined to encode knowledge of code-mixed PHIs and the defined slots were learned from the training set to assemble into principles for recognizing PHI mentions and synthesizing surrogates at the same time. A semantic disambiguation process is developed used to disambiguate ambiguous PHI categories in the de-identification process and to dynamically extend the knowledge encoded in PBA during the knowledge augmentation process. The experimental results demonstrate that the proposed method can achieve the best micro-and macro-F-scores performance in comparison with the other mono-and multilingual language models finetuned on our code-mixed corpus.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

You-Qian Lee

Protected Health Information Recognition of Unstructured Code-Mixed Electronic Health Records in Taiwan

Principle-Based Approach for the De-Identification of Code-Mixed Electronic Health Records

Contact Info

Product

Resources

About