Information retrieval based on context distance and morphology

Jing, He; Tzoukermann, Evelyne

doi:10.1145/312624.312661

Cited by 34 publications

(22 citation statements)

References 12 publications

(2 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In doing so, Rapp demonstrated that word vectors in such models contain an aggregate representation of the underlying semantics, which implies content rich vectors (i.e., those collecting co-occurrence statistics for a word used in multiple contexts and senses, as affixes are) can provide better representations of a given word's semantic content. Similarly, Jing and Tzoukermann (1999) demonstrated that morphological information improved calculations of semantic relatedness between two words (i.e., by computing the distance between their vectors using their own model based on second-order, word-word co-occurrence statistics).…”

Section: Performance Impact Of Morphological Decompositionmentioning

confidence: 98%

“…Jing and Tzoukermann (1999) used externally provided morphological information and showed it improved calculations of semantic relatedness between two words (i.e., by computing the distance between their vectors using an implementation based on second-order, word-word lexical co-occurrence statistics). Harman (1991) found that stemming provided no performance improvement, regardless of the stemming algorithm used.…”

Section: Limitations Of Corpus-based Semantic Space Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Performance impact of stop lists and morphological decomposition on word–word corpus-based semantic space models

2015

View full text Add to dashboard Cite

Corpus-based semantic space models, which primarily rely on lexical co-occurrence statistics, have proven effective in modeling and predicting human behavior in a number of experimental paradigms that explore semantic memory representation. The most widely studied extant models, however, are strongly influenced by orthographic word frequency (e.g., Shaoul & Westbury, Behavior Research Methods, 38, 190-195, 2006). This has the implication that high-frequency closed-class words can potentially bias co-occurrence statistics. Because these closed-class words are purported to carry primarily syntactic, rather than semantic, information, the performance of corpus-based semantic space models may be improved by excluding closedclass words (using stop lists) from co-occurrence statistics, while retaining their syntactic information through other means (e.g., part-of-speech tagging and/or affixes from inflected word forms). Additionally, very little work has been done to explore the effect of employing morphological decomposition on the inflected forms of words in corpora prior to compiling cooccurrence statistics, despite (controversial) evidence that humans perform early morphological decomposition in semantic processing. In this study, we explored the impact of these factors on corpus-based semantic space models. From this study, morphological decomposition appears to significantly improve performance in word-word co-occurrence semantic space models, providing some support for the claim that sublexical information-specifically, word morphologyplays a role in lexical semantic processing. An overall decrease in performance was observed in models employing stop lists (e.g., excluding closed-class words). Furthermore, we found some evidence that weakens the claim that closed-class words supply primarily syntactic information in word-word cooccurrence semantic space models. Human language, and the semantic representation it facilitates, is a complex behavior. To understand language, one needs to know the meaning of words, and retain knowledge regarding the grammatical application of words. The former requirement is addressed by lexical semantics, or the study of individual word meanings as constrained by morphology. Here, meaning is defined by context that is likely derived from statistical redundancies in multisensory elements perceived in environment-that is, more than those found in analyzing text alone. Using text alone is not likely to ever provide a comprehensive basis for modeling language comprehension, yet, it has been shown that many aspects of perception and cognition can be understood in isolation by modeling specific capacities as computational problems (Anderson, 1990;Marr, 1982). One such approach in acquiring an understanding of semantic representation involves using simple mechanism(s) operating on large scale. This approach has yielded a rich history of both high level and derived mechanistic memory models for lexical semantic representations. Many of these mechanistic models

show abstract

Section: Performance Impact Of Morphological Decompositionmentioning

confidence: 98%

Section: Limitations Of Corpus-based Semantic Space Modelsmentioning

confidence: 99%

Performance impact of stop lists and morphological decomposition on word–word corpus-based semantic space models

2015

View full text Add to dashboard Cite

show abstract

“…At the end of the windowing process, an information theoretic measure is applied to compute the co-occurrence statistics between the targeting linguistic patterns and other tokens appearing in the same text window across the corpus. Thereby, context vectors [17], [41] can be created to describe the semantic of the extracted concepts.…”

Section: A Framework For Automatic Concept Map Generationmentioning

confidence: 99%

“…Collocational expressions provide the contexts to extract the semantics of concepts embedded in natural language texts such as net news, blogs, emails, or Web documents [35]. In computational linguistic, a term refers to one or more tokens (words), and a term could also been taken as a concept if it carries recognizable meaning with respect to a context (domain) [17], [31].…”

Section: The Cognitive and Linguistic Foundationsmentioning

confidence: 99%

Toward a Fuzzy Domain Ontology Extraction Method for Adaptive e-Learning

Lau

Song

et al. 2009

IEEE Trans. Knowl. Data Eng.

132

View full text Add to dashboard Cite

Abstract-With the widespread applications of electronic learning (e-Learning) technologies to education at all levels, increasing number of online educational resources and messages are generated from the corresponding e-Learning environments. Nevertheless, it is quite difficult, if not totally impossible, for instructors to read through and analyze the online messages to predict the progress of their students on the fly. The main contribution of this paper is the illustration of a novel concept map generation mechanism which is underpinned by a fuzzy domain ontology extraction algorithm. The proposed mechanism can automatically construct concept maps based on the messages posted to online discussion forums. By browsing the concept maps, instructors can quickly identify the progress of their students and adjust the pedagogical sequence on the fly. Our initial experimental results reveal that the accuracy and the quality of the automatically generated concept maps are promising. Our research work opens the door to the development and application of intelligent software tools to enhance e-Learning.

show abstract

“…Term co-occurrence has been used for many purposes (Jing and Tzoukermann 1999). However, they do not find communities.…”

Section: Related Workmentioning

confidence: 99%

Mining community structure of named entities from free text

Liu

2005

Proceedings of the 14th ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

Although community discovery based on social network has been studied extensively in the Web hyperlink environment, limited research has been done in the case of Web documents. The co-occurrence of Words and entities in sentences and documents usually implies some connections among them. Studying such connections may reveal important relationships. In this paper, we investigate the cooccurrences of named entities in Web pages and blogs, and mine communities among those entities. We show that identifying communities in such an environment can be transformed into a graph clustering problem. A hierarchical clustering algorithm is then proposed, which exploits triangle structures within the graph and the mutual information between vertices. Our empirical study shows that the proposed algorithm is promising in discovering communities from Web documents.

show abstract

Information retrieval based on context distance and morphology

Cited by 34 publications

References 12 publications

Performance impact of stop lists and morphological decomposition on word–word corpus-based semantic space models

Performance impact of stop lists and morphological decomposition on word–word corpus-based semantic space models

Toward a Fuzzy Domain Ontology Extraction Method for Adaptive e-Learning

Mining community structure of named entities from free text

Contact Info

Product

Resources

About