Proceedings of the 2013 ACM Symposium on Document Engineering 2013
DOI: 10.1145/2494266.2494282
|View full text |Cite
|
Sign up to set email alerts
|

Searching online book documents and analyzing book citations

Abstract: Academic search engines and digital libraries provide convenient online search and access facilities for scientific publications. However, most existing systems do not include books in their collections although several books are freely available online. Academic books are different from papers in terms of their length, contents and structure. We argue that accounting for academic books is important in understanding and assessing scientific impact. We introduce an open-book search engine that extracts and inde… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 31 publications
0
5
0
Order By: Relevance
“…P start contains the first 50 pages in P. Previous work (Z. Wu, Das, et al, 2013; used the first 20 pages of the book for this step, but we found textbooks where that margin is not enough. The first TOC page is added to P toc , which is the list containing all pages of the TOC.…”
Section: Logical Element Identificationmentioning
confidence: 99%
“…P start contains the first 50 pages in P. Previous work (Z. Wu, Das, et al, 2013; used the first 20 pages of the book for this step, but we found textbooks where that margin is not enough. The first TOC page is added to P toc , which is the list containing all pages of the TOC.…”
Section: Logical Element Identificationmentioning
confidence: 99%
“…Wu et al [10] described the use of SVM-based metadata extraction (SVMHeaderParse [56]) for CiteSeerX. However, this method is known to work poorly for metadata extraction of books [72].…”
Section: Metadatamentioning
confidence: 99%
“…Wu et al [56] gave a hybrid approach using SVMbased extractor and rule-based extractor for extracting authors and title of a book. Two sections that differentiate books from other scholarly documents is the presence of table of contents (TOC) [108] and indexes [106] [107], usually present at the back of the book.…”
Section: Sections and Additional Informationmentioning
confidence: 99%
“…So on this dataset we cannot consider the location of index terms, but only serve the evaluation as a keyword extraction task. Books in the second dataset was collected from the Citeseer repository, most of which are in computer science and engineering [21]. We manually select 213 books with good quality back-of-the-book index.…”
Section: Datasets and Experimental Setupmentioning
confidence: 99%