2006
DOI: 10.1145/1138379.1138380
|View full text |Cite
|
Sign up to set email alerts
|

Automatic expansion of domain-specific lexicons by term categorization

Abstract: We discuss an approach to the automatic expansion of domain-specific lexicons, that is, to the problem of extending, for each c i in a predefined setOur approach relies on term categorization, defined as the task of labeling previously unlabeled terms according to a predefined set of domains. We approach this as a supervised learning problem in which term classifiers are built using the initial lexicons as training data. Dually to classic text categorization tasks in which documents are represented as vectors … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
18
0

Year Published

2006
2006
2020
2020

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(18 citation statements)
references
References 32 publications
0
18
0
Order By: Relevance
“…We chose to test the system using the dataset described in [1] referred in the following as DS. It is composed by a set of 27048 nouns assigned to one or more classes out of 42 different categories.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…We chose to test the system using the dataset described in [1] referred in the following as DS. It is composed by a set of 27048 nouns assigned to one or more classes out of 42 different categories.…”
Section: Resultsmentioning
confidence: 99%
“…In [1], the authors approach the term categorization problem as the dual of text categorization. They validated the proposed model attempting to automatically replicate the WordNetDomains [2] lexicon (an extension to WordNet in which the synsets have been categorized into a subset of the DDC 1 scheme) by exploiting the Reuters Corpus.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The resultant feature vectors are then used by a centroid-based classifier using cosine similarity measure to label the words. Avancini, Lavelli, Sebastiani, and Zanoli (2006) take a classification approach to semantic lexicon construction. They cast the problem as a term (meaning both words and phrases) categorization task (dual of the document categorization task), and similar to the bag-of-word model, they represent the terms as bag-of-documents.…”
Section: Related Workmentioning
confidence: 99%
“…To apply machine learning (ML) to one of the standard DL circulation activities, namely text categorization [48], is part of the cognitive toolbox deployed [18]. In this context, ML is extensively being experimented with in different development areas and scenarios; to name but a few, for extracting image content from figures in scientific documents for categorization [33,34], automatically assessing and characterizing resource quality for educational DL [54,5], assessing the quality of scientific conferences [37], web-based collection development [42], automated document metadata extraction by support vector machines (SVM, [24]), automatic extraction of titles from general documents [27], information architecture [17], to remove duplicate documents [9], for collaborative filtering [59], for the automatic expansion of domain-specific lexicons by term categorization [3], for generating visual thesauri [45], or the semantic markup of documents [13]. As part of this direction of research, ML is being tested for its ability to reproduce parts of collections indexed by widespread classification schemes in a supervised learning setting, such as automatic text categorization using the Dewey Decimal Classification (DDC, [52]), or the Library of Congress Classification (LCC) from Library of Congress Subject Headings (LCSH, [20,43]).…”
Section: Introductionmentioning
confidence: 99%