DOI: 10.11606/d.55.2009.tde-06052009-154832
|View full text |Cite
|
Sign up to set email alerts
|

Avaliação de métodos não-supervisionados de seleção de atributos para mineração de textos

Abstract: A Deus pelas bênçãos alcançadas ao longo dessa jornada. Aos meus pais, José Geraldo e Rita, pelo amor, apoio e compreensão irrestritos com que pude contar durante todo o tempo, em todas as etapas da minha vida. Se consegui chegar até aqui, foi porque me espelhei e me apoiei em vocês, exemplos de determinação. Agradeço também aos meus irmãos, Túlio e Thayse, pelo amor e camaradagem. Ter de deixar o convívio diário com todos vocês foi a decisão mais difícil que já tomei. Mas saibam que, mesmo a 600 quilômetros d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0
3

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(11 citation statements)
references
References 34 publications
(47 reference statements)
0
8
0
3
Order By: Relevance
“…Next, terms are extracted, and therefore, they are used to describe the text base, as detailed in Conrado [52]. To reduce the amount of terms to be worked with, a term selection is performed by using, e.g., the Luhn, Salton, and term variance, methods, which are detailed in the work of Nogueira [38].…”
Section: The Toptax Methodologymentioning
confidence: 99%
See 2 more Smart Citations
“…Next, terms are extracted, and therefore, they are used to describe the text base, as detailed in Conrado [52]. To reduce the amount of terms to be worked with, a term selection is performed by using, e.g., the Luhn, Salton, and term variance, methods, which are detailed in the work of Nogueira [38].…”
Section: The Toptax Methodologymentioning
confidence: 99%
“…The zstf [38] measure, formally described in Equation 14, assumes that some parts of the document (such as the abstract and the conclusion) bring higher relevant information about the contents of the document than other parts. Based on this consideration, it attributes higher weights to the words that occur in parts of the document with higher impact or in which higher information related to the content of the document is concentrated.…”
Section: (A) Log Ilkelihood Ratio (Ll)mentioning
confidence: 99%
See 1 more Smart Citation
“…Luhn [8] and LuhnDF [9] are semi-automatic methods that plot histograms from candidate terms based on, respectively, candidate frequencies (tf ) and document frequencies (df ). These histograms facilitate the visualization of any possible pattern that candidates may follow and, then, the histograms help to determine a threshold.…”
Section: Related Workmentioning
confidence: 99%
“…On the other hand, unsupervised feature selection algorithms may be employed in unlabeled datasets. Nogueira (2009) presents a comparison of some unsupervised feature selection algorithms for Text Mining. The most commonly used method is the Luhn's method (Luhn, 1958).…”
Section: Pre-processingmentioning
confidence: 99%