Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006
DOI: 10.1145/1141753.1141778
|View full text |Cite
|
Sign up to set email alerts
|

Automatic categorization of figures in scientific documents

Abstract: Figures are very important non-textual information contained in scientific documents. Current digital libraries do not provide users tools to retrieve documents based on the information available within the figures. We propose an architecture for retrieving documents by integrating figures and other information. The initial step in enabling integrated document search is to categorize figures into a set of pre-defined types. We propose several categories of figures based on their functionalities in scholarly ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2007
2007
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 23 publications
(14 citation statements)
references
References 31 publications
0
14
0
Order By: Relevance
“…For instance, Lu et al in [7] proposes to utilize the content of figures in searching scientific literature in digital libraries. The work described in this paper is only focused on the categorization of figures included in scientific documents.…”
Section: Related Workmentioning
confidence: 99%
“…For instance, Lu et al in [7] proposes to utilize the content of figures in searching scientific literature in digital libraries. The work described in this paper is only focused on the categorization of figures included in scientific documents.…”
Section: Related Workmentioning
confidence: 99%
“…In the digital libraries, there are many non-textual information, figures, contained in scientific documents. To assist current digital library end user to look for information within figures, several scholars begin to explore effective methods to retrieval figures content in scientific digital libraries [10,11].…”
Section: Complex Information Queriesmentioning
confidence: 99%
“…To apply machine learning (ML) to one of the standard DL circulation activities, namely text categorization [48], is part of the cognitive toolbox deployed [18]. In this context, ML is extensively being experimented with in different development areas and scenarios; to name but a few, for extracting image content from figures in scientific documents for categorization [33,34], automatically assessing and characterizing resource quality for educational DL [54,5], assessing the quality of scientific conferences [37], web-based collection development [42], automated document metadata extraction by support vector machines (SVM, [24]), automatic extraction of titles from general documents [27], information architecture [17], to remove duplicate documents [9], for collaborative filtering [59], for the automatic expansion of domain-specific lexicons by term categorization [3], for generating visual thesauri [45], or the semantic markup of documents [13]. As part of this direction of research, ML is being tested for its ability to reproduce parts of collections indexed by widespread classification schemes in a supervised learning setting, such as automatic text categorization using the Dewey Decimal Classification (DDC, [52]), or the Library of Congress Classification (LCC) from Library of Congress Subject Headings (LCSH, [20,43]).…”
Section: Introductionmentioning
confidence: 99%
“…The second one is that the setup includes lexical resources as the interpretational context of content elements in text documents [8]. Although such a contextual representation scheme for medical literature in a DL setting had been tested [44], wavelet analysis is mostly experimented with in the image, video, and audio processing context, based on analogical information representation [33,14,31,35,39].…”
Section: Introductionmentioning
confidence: 99%