Ambiguity is a common phenomenon in text, especially in the biomedical domain. For instance, it is frequently the case that a gene, a protein encoded by the gene, and a disease associated with the protein share the same name. Resolving this problem, that is, assigning to an ambiguous word in a given context its correct meaning is called word sense disambiguation (WSD). It is a pre-requisite for associating entities in text to external identifiers and thus to put the results from text mining into a larger knowledge framework. In this chapter, we introduce the WSD problem and sketch general approaches for solving it. The authors then describe in detail the results of a study in WSD using classification. For each sense of an ambiguous term, they collected a large number of exemplary texts automatically and used them to train an SVM-based classifier. This method reaches a median success rate of 97%. The authors also provide an analysis of potential sources and methods to obtain training examples, which proved to be the most difficult part of this study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.