2008
DOI: 10.1007/s10115-008-0152-4
|View full text |Cite
|
Sign up to set email alerts
|

Using Wikipedia knowledge to improve text classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
62
0
4

Year Published

2011
2011
2020
2020

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 132 publications
(68 citation statements)
references
References 9 publications
0
62
0
4
Order By: Relevance
“…Examples include information retrieval [4,14,18], named entity disambiguation [1,2,7,8,11,12], text classification [25] and entity ranking [10]. To extract the content of an entity context, many researches directly used the Wikipedia article describing the entity [1,2,8,9,14,[25][26][27]; some works extended the article with all the other Wikipedia articles linked to the Wikipedia article describing the entity [6,7,12]; while some only considered the first paragraph of the Wikipedia article describing the entity [2]. Different from these approaches, our Graph-based approach not only employs in-links and languagelinks to broaden the article set that is likely to mention the entity, but also performs a finer-grained process: extracting the sentences that mention the entity, such that all the sentences in our context are closely related to the target entity.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Examples include information retrieval [4,14,18], named entity disambiguation [1,2,7,8,11,12], text classification [25] and entity ranking [10]. To extract the content of an entity context, many researches directly used the Wikipedia article describing the entity [1,2,8,9,14,[25][26][27]; some works extended the article with all the other Wikipedia articles linked to the Wikipedia article describing the entity [6,7,12]; while some only considered the first paragraph of the Wikipedia article describing the entity [2]. Different from these approaches, our Graph-based approach not only employs in-links and languagelinks to broaden the article set that is likely to mention the entity, but also performs a finer-grained process: extracting the sentences that mention the entity, such that all the sentences in our context are closely related to the target entity.…”
Section: Related Workmentioning
confidence: 99%
“…As to the context-based representation vector of the entity, [1,11] defined it as the tf-idf/word count/binary occurrence values of all the vocabulary words in the context content; [2,19] defined it as the word count/binary occurrence values of other entities in the context content; [5,6,9,14,25] defined it as the tf-idf similarity values between the target entity's context content and other entities' context contents from Wikipedia; [27] defined it as the visiting probability from the target entity to other entities from Wikipedia; [7,26] used a measurement based on the common entities linked to the target entity and other entities from Wikipedia. Different from all former researches, we employ aspect weights that have a different interpretation of the frequency and selectivity than the typical tf-idf values and take co-occurrence and language specificity of the aspects into account.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In fact, it has been found in several experiments that more sophisticated representations do not yield any significant increase in effectiveness [34], although there are some recent approaches that have shown promise. For example, Wang automatically constructed a thesaurus of concepts from Wikipedia and introduced a unified framework to expand the bag-of-words representation with semantic relations [39]. More research is needed in order to establish whether this type of expansion really increases performance significantly over the traditional model.…”
Section: Post Similarity Analysismentioning
confidence: 99%
“…Wikipedia was already applied in many studies on conceptualizing and contextualizing document collections. To name just a few recent examples, applications include clustering [3,4], assigning readable labels to the obtained document clusters [5,6], facilitating classification [7], or extracting keywords [8]. However, not much is known about the effectiveness of Wikipedia when it comes to processing scientific texts.…”
Section: Introductionmentioning
confidence: 99%