2010 Second International Workshop on Education Technology and Computer Science 2010
DOI: 10.1109/etcs.2010.130
|View full text |Cite
|
Sign up to set email alerts
|

A Refined TF-IDF Algorithm Based on Channel Distribution Information for Web News Feature Extraction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0
1

Year Published

2013
2013
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 7 publications
0
7
0
1
Order By: Relevance
“…Yuan et al [15] utilized a variety of features including statistical features, location features, and part of speech features to evaluate the weight of candidate keywords. Some scholars have also improved the algorithm from the perspective of news text category [16], [17]. Xu et al [17] were convinced that the news from each of the categories have some proper nouns that appear frequently in the document, but are not meaningful.…”
Section: B Text Feature Selectionmentioning
confidence: 99%
“…Yuan et al [15] utilized a variety of features including statistical features, location features, and part of speech features to evaluate the weight of candidate keywords. Some scholars have also improved the algorithm from the perspective of news text category [16], [17]. Xu et al [17] were convinced that the news from each of the categories have some proper nouns that appear frequently in the document, but are not meaningful.…”
Section: B Text Feature Selectionmentioning
confidence: 99%
“…The figure below describes our step to convert gathered information to be document vector. There is two factors that are used in common information processing system [9]: TF, the frequency of the term in a text segment, and IDF, which is used to indicate the distinction of the term.…”
Section: B the Weighting Schemementioning
confidence: 99%
“…The Vector Space Model is commonly used structured form for text data in which individual text documents are represented as a set of vectors [9]. Later the matrix M would be converted into single vector V (word) so that those collections of words could be clustered using k-means algorithm.…”
Section: Vector Space Modelmentioning
confidence: 99%
“…run a Hidden Markov Model [11] speech recognition algorithm on media files for transcribing the speech to text; -compute the TF-IDF measure for all articles and transcriptions, as specified in the CDI IDF algorithm [12]; -compute the page rank of articles, as specified in the weighted pagerank algorithm [13]; -perform a topic extraction routine on articles and transcriptions, as specified in the Latent Dirichlet Allocation algorithm [14].…”
Section: Metadata Processesmentioning
confidence: 99%