2006
DOI: 10.1007/11735106_41
|View full text |Cite
|
Sign up to set email alerts
|

PERC: A Personal Email Classifier

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2006
2006
2019
2019

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 6 publications
0
5
0
Order By: Relevance
“…ERPANET [11]) and Automatic Metadata Generation (AMG) at the Catholic University of Leuven( [2]), and the extraction of bibliographic information from medical articles, based on the detection of contiguous blocks and fuzzy pattern matching, is available from Medical Article Record System (MARS) ( [42]) developed at the US National Library of Medicine (NLM) ([30]). There have also been previous work on metadata extraction from scientific articles in postscript using a knowledge base of stylistic cues ( [19], [20]) and, from the language processing community, there have been results in automatic categorisation of emails ( [6], [24]), text categorisation ( [39]) and document content summarisation ( [43]). Other communities have used image analysis for information extraction from the Internet ( [3]), document white space analysis ( [9]), graphics recognition in PDF files ( [41]), and algorithms for page segmentation ( [40]).…”
Section: Background and Objectivementioning
confidence: 99%
“…ERPANET [11]) and Automatic Metadata Generation (AMG) at the Catholic University of Leuven( [2]), and the extraction of bibliographic information from medical articles, based on the detection of contiguous blocks and fuzzy pattern matching, is available from Medical Article Record System (MARS) ( [42]) developed at the US National Library of Medicine (NLM) ([30]). There have also been previous work on metadata extraction from scientific articles in postscript using a knowledge base of stylistic cues ( [19], [20]) and, from the language processing community, there have been results in automatic categorisation of emails ( [6], [24]), text categorisation ( [39]) and document content summarisation ( [43]). Other communities have used image analysis for information extraction from the Internet ( [3]), document white space analysis ( [9]), graphics recognition in PDF files ( [41]), and algorithms for page segmentation ( [40]).…”
Section: Background and Objectivementioning
confidence: 99%
“…We are also hoping to gather some information from users of the Corpus online 6 . Further labelling performed by volunteer classifiers from other background using the classification system available online 7 may also help to understand the extensibility of the results in this paper.…”
Section: Overall Conclusionmentioning
confidence: 96%
“…3 Past efforts in automated metadata extraction (e.g. [4], [6], [16], dc-dot metadata editor;4 [1], [7]) employ methods that often rely on structural elements or presentation styles found to be common among the documents. These structural elements or styles are closely bound to the genre of the document, hence, it seems reasonable that a better understanding of the genre of documents and how they are used in information search would be a key step in developing a broadly effective metadata extraction tool.…”
Section: Introductionmentioning
confidence: 99%
“…Previous work exists on the extraction of descriptive metadata extraction within specific domains or genres (e.g. MetadataExtractor, DC-dot, Automatic Metadata Generation, Thoma, 2001;Giuffrida, Shek, & Yang, 2000;Han, Giles, Manavoglu, Zha, Zhang, & Fox, 2000;Bekkerman, McCallum, & Huang, 2004;Ke, Bowerman, & Oakes, 2006;Sebastiani, 2002;and Witte, Krestel, & Bergler, 2005). However, a general tool has yet to be developed to extract metadata from documents of varied forms and subjects.…”
Section: Introductionmentioning
confidence: 99%