Proceedings of the Tenth International Conference on Information and Knowledge Management 2001
DOI: 10.1145/502585.502593
|View full text |Cite
|
Sign up to set email alerts
|

Extracting meaningful labels for WEBSOM text archives

Abstract: Self-Organizing Maps, being used mainly with data that are not pre-labeled, need automatic procedures for extracting keywords as labels for each of the map units. The WEBSOM methodology for building very large text archives has a very slow method for extracting such unit labels. It computes the relative frequencies of all the words of all the documents associated to each unit and then compares these to the relative frequencies of all the words of all the other units of the map. Since maps may have more than 10… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2002
2002
2014
2014

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(12 citation statements)
references
References 13 publications
(5 reference statements)
0
12
0
Order By: Relevance
“…One important methodology that is effective in archiving up to seven million text documents is the WEBSOM [1], [2], [3], [4], [5], [6], [7], which uses a "Self-Organizing Map" (SOM) at the core of its archiving technique. A number of other SOM-based text archiving techniques have been described in the literature [8], [9], [10], [11]. They differ mainly in the preprocessing and postprocessing stages of the archiving process.…”
Section: Very Large Text Archivesmentioning
confidence: 99%
See 1 more Smart Citation
“…One important methodology that is effective in archiving up to seven million text documents is the WEBSOM [1], [2], [3], [4], [5], [6], [7], which uses a "Self-Organizing Map" (SOM) at the core of its archiving technique. A number of other SOM-based text archiving techniques have been described in the literature [8], [9], [10], [11]. They differ mainly in the preprocessing and postprocessing stages of the archiving process.…”
Section: Very Large Text Archivesmentioning
confidence: 99%
“…Our experiment on the trained SOM using both the entire news collection and the subset we used here was reported in [10]. Details about the extracted keywords for the Reuters subset can be found in [11]. As for the assessment of the quality of the keywords extracted based on both this news collection and the CNN collection, this is presented in the next section.…”
Section: Extracting Abstract For News Archivesmentioning
confidence: 99%
“…Other approaches that are statistical and computational in nature, use data-driven machine learning algorithms to perform the task of distinguishing keywords from non-keywords would include Genetic Algorithms, Support Vector Machines, Decision Trees, Self-Organizing Maps, and Artificial Neural Networks [17] [23].…”
Section: Introductionmentioning
confidence: 99%
“…This is true especially for very large archives with millions or more articles. Processing all the words in the documents, as if they are of equal importance, as basis for finding relevant articles would be slow and not practical [1]. That is why it is important to have a set of good keywords that represent the actual contents of the document.…”
Section: Introductionmentioning
confidence: 99%