Medical concept normalization helps in discovering standard concepts in free-form text i.e., maps health-related mentions to standard concepts in a clinical knowledge base. It is
Medical concept normalization (MCN) i.e., mapping of colloquial medical phrases to standard concepts is an essential step in analysis of medical social media text. The main drawback in existing state-of-the-art approach (Kalyan and Sangeetha, 2020b) is learning target concept vector representations from scratch which requires more training instances. Our model is based on RoBERTa and target concept embeddings. In our model, we integrate a) target concept information in the form of target concept vectors generated by encoding target concept descriptions using SRoBERTa, state-of-the-art RoBERTa based sentence embedding model and b) domain lexicon knowledge by enriching target concept vectors with synonym relationship knowledge using retrofitting algorithm. It is the first attempt in MCN to exploit both target concept information as well as domain lexicon knowledge in the form of retrofitted target concept vectors. Our model outperforms all the existing models with an accuracy improvement up to 1.36% on three standard datasets. Further, our model when trained only on mapping lexicon synonyms achieves up to 4.87% improvement in accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.