This article surveys work on Unsupervised Learning of Morphology. We define Unsupervised Learning of Morphology as the problem of inducing a description (of some kind, even if only morpheme-segmentation) of how orthographic words are built up given only raw text data of a language. We briefly go through the history and motivation of the this problem. Next, over 200 items of work are listed with a brief characterization, and the most important ideas in the field are critically discussed. We summarize the achievements so far and give pointers for future developments.
Historical linguistics, the oldest branch of modern linguistics, deals with language-relatedness and language change across space and time. Historical linguists apply the widely-tested comparative method [Durie and Ross, 1996] to establish relationships between languages to posit a language family and to reconstruct the proto-language for a language family. 1Although historical linguistics has parallel origins with biology [Atkinson and Gray, 2005], unlike the biologists, mainstream historical linguists have seldom been enthusiastic about using quantitative methods for the discovery of language relationships or investigating the structure of a language family, except for Kroeber and Chrétien [1937] and Ellegård [1959].A short period of enthusiastic application of quantitative methods initiated by Swadesh [1950] ended with the heavy criticism levelled against it by Bergsland and Vogt [1962]. The field of computational historical linguistics did not receive much attention again until the beginning of the 1990s, with the exception of two noteworthy doctoral dissertations, by Sankoff [1969] and Embleton [1986].In traditional lexicostatistics, as introduced by Swadesh [1952], distances between languages are based on human expert cognacy judgments of items in standardized word lists, e.g., the Swadesh lists [Swadesh, 1955]. In the terminology of historical linguistics, cognates are related words across languages that can be traced directly back to the proto-language.Cognates are identified through regular sound correspondences. Sometimes cognates have similar surface form and related meanings. Examples of such revealing kind of cognates are:
English German∼ night ∼ Nacht 'night' and hound ∼ Hund 'dog'. If a word has undergone many changes then the relatedness is not obvious from visual inspection and one needs to look into the history of the word to exactly understand the sound changes which resulted in the synchronic form. For instance, the English Hindi ∼ wheel ∼ chakra 'wheel' are cognates and can be traced back to the proto-Indo-European root k w ek w lo-.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.