Models of Incremental Concept Formation

Gennari, John H.; Langley, Pat; Fisher, Douglas

doi:10.21236/ada199617

Cited by 45 publications

(54 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The most well-known conceptual clustering system is COBWEB [11,17]. It creates clusters that are characterized by the list of nominal attribute values and probabilities associated with them.…”

Section: 22êêclassit/agglommentioning

confidence: 99%

See 1 more Smart Citation

An Evaluation of Techniques for Clustering Search Results

Leouski

Croft

2005

View full text Add to dashboard Cite

Abstract.The ability to effectively organize retrieval results becomes more important as the focus of Information Retrieval (IR) shifts towards interactive search processes. Automatic classification techniques are capable of providing the necessary information organization by arranging the retrieved data into groups of documents with common subjects.In this paper, we compare classification methods from IR and Machine Learning (ML) for clustering search results. Issues such as document representation, classification algorithms, and cluster representation are discussed. We introduce several evaluation techniques and use them in preliminary experiments. These experiments indicate that the proposed techniques have promise, but it is clear that user experiments are required to carry out more thorough evaluation.T his material is based on work supported in part by the National Science Foundation, Library of Congress and Department of Commerce under cooperative agreement number EEC-9209623. Any opinions, findings and conclusions or recommendations expressed in this material are the author(s) and do not necessarily reflect those of the sponsor.This material is based on work supported in part by NRaD Contract Number N66001-94-D-6054. 1ÊÊIntroductionAn IR system typically produces a ranked list of documents in response to a user's query. These documents are presented to the user for examination and evaluation. Although the documents are ranked, there is significant potential benefit in providing additional structure in long retrieved lists.The role of information organization becomes even more important in the interactive model of retrieval, where the focus is on the user's participation in a cycle of query formulation, presentation of search results, and query reformulation.A natural alternative to ranking is to divide (or cluster) the retrieved set into groups of documents with common subjects. For example, consider a situation when the system is presented with a general query. The retrieval results would contain a wide variety of topics in that general area. An automatic classification tool could create classes of similar documents allowing the user to focus on a particular topic. In this paper we consider the problem of design and evaluation of such a browsing tool for an existing IR system.We begin by discussing the recent research on clustering in IR and ML. Surprisingly, only a few systems have used clustering methods for organizing retrieval results. Moreover, there is virtually no literature about attempts to evaluate these techniques. Clustering has also been studied in Machine Learning (ML) for a relatively long time and a large number of algorithms has been developed. There has, however, been few application of these techniques to IR [1].We believe there are four major issues need to be considered: ¥ the input of the classifier, or the document representations. In general, documents are treated as vectors of weight-term pairs. However, the questions of which terms to chose and whether to use the whole document...

show abstract

Section: 22êêclassit/agglommentioning

confidence: 99%

“…In our experiments, we use the successor to COBWEB (CLASSIT [17]), that extends these ideas onto continuous attribute values. We had to modify CLASSIT's evaluation function to account for the missing terms [13]:…”

Section: 22êêclassit/agglommentioning

confidence: 99%

An Evaluation of Techniques for Clustering Search Results

Leouski

Croft

2005

View full text Add to dashboard Cite

show abstract

“…Concept formation systems construct a hierarchical organization of intensional concept definitions (Gennari et al, 1989), and should therefore be based on a theory of concept generality and subsumption. Description logics (Nebel, 1990) are a restricted first-order formalism with specialized inference rules for detecting extensional subsumption.…”

Section: Discovery In a Spatial Description Logicmentioning

confidence: 99%

Machine discovery of protein motifs

Conklin¹

1995

Mach Learn

View full text Add to dashboard Cite

Abstract. The investigation of relations between protein tertiary structure and amino acid sequence is a topic of tremendous importance in molecular biology. The automated discovery of recurrent patterns of structure and sequence is an essential part of this investigation. These patterns, known as protein motifs, are abstractions of fragments drawn from proteins of known sequence and tertiary structure. This paper has two objectives. The first is to introduce and define protein motifs, and provide a survey of previous research on protein motif discovery. The second is to present and apply a novel approach to protein motif representation and discovery, which is based on a spatial description logic and the symbolic machine learning paradigm of structured concept formation. A large database of protein fragments is processed using this approach, and several interesting and significant protein motifs are discovered.

show abstract

“…Genarri et al (1989) limited the number of retained observations with the help of a program parameter, but such considerations are beyond the scope of the present paper.…”

Section: Hierarchy Building In Hierarchmentioning

confidence: 99%

A branch and bound incremental conceptual clusterer

Nevins

1995

Mach Learn

View full text Add to dashboard Cite

Editor: Tom DietterichAbstract. A computer program is described that is capable of learning multiple concepts and their structural descriptions from observations of examples. It decomposes this conceptual clustering problem into two modules. The first module is concerned with forming a generalization from a pair of examples by extracting their common structure and calculating an information measure for each structural description. The second module, which is the subject of this paper, incrementally incorporates these generalizations into a hierarchy of concepts. This second module operates without reference to any underlying representation language and utilizes only the information measure provided by the first module, while employing a branch and bound procedure to search the hierarchy for concepts from which to form new clusters. This ability to search the hierarchy is used as the basis of a hill climbing strategy which has as its goal the avoidance of local peaks so as to reduce the sensitivity of the program to the order in which the observations are encountered,

show abstract

Models of Incremental Concept Formation

Abstract: r; Y' *'

Cited by 45 publications

References 14 publications

An Evaluation of Techniques for Clustering Search Results

An Evaluation of Techniques for Clustering Search Results

Machine discovery of protein motifs

A branch and bound incremental conceptual clusterer

Contact Info

Product

Resources

About