The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2015
DOI: 10.1007/978-3-319-16354-3_88
|View full text |Cite
|
Sign up to set email alerts
|

The iCrawl Wizard – Supporting Interactive Focused Crawl Specification

Abstract: Abstract. Collections of Web documents about specific topics are needed for many areas of current research. Focused crawling enables the creation of such collections on demand. Current focused crawlers require the user to manually specify starting points for the crawl (seed URLs). These are also used to describe the expected topic of the collection. The choice of seed URLs influences the quality of the resulting collection and requires a lot of expertise. In this demonstration we present the iCrawl Wizard, a t… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0
2

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
1
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 3 publications
(3 reference statements)
0
4
0
2
Order By: Relevance
“…Ayrıca tohum URL seçiminde en çok kullanılan yöntemler; manuel seçim [7][8][9], DMOZ ve curlie.org [10,11] gibi açık kaynak dizinlerinden yapılan seçim ve Twitter [12,13] gibi sosyal medyadaki kullanıcıların paylaştıkları URL'ler üzerinden seçimlerdir. Bunlara ek özellikle odaklı tarayıcılarda Google ve Yahoo gibi arama motorları ile yapılan aramalarda, ortaya çıkan URL'leri, tohum URL olarak seçen çalışmalarda mevcuttur [14][15][16][17].…”
Section: Tohum Url Seçi̇mi̇unclassified
“…Ayrıca tohum URL seçiminde en çok kullanılan yöntemler; manuel seçim [7][8][9], DMOZ ve curlie.org [10,11] gibi açık kaynak dizinlerinden yapılan seçim ve Twitter [12,13] gibi sosyal medyadaki kullanıcıların paylaştıkları URL'ler üzerinden seçimlerdir. Bunlara ek özellikle odaklı tarayıcılarda Google ve Yahoo gibi arama motorları ile yapılan aramalarda, ortaya çıkan URL'leri, tohum URL olarak seçen çalışmalarda mevcuttur [14][15][16][17].…”
Section: Tohum Url Seçi̇mi̇unclassified
“…The Collection Specification describes the topical and temporal scope of relevant documents to be included in the event-centric collection. Like the approaches for search and focused crawling on the live Web proposed in (Gossen et al, 2015b), a Collection Specification can indicate the intended topical scope through keywords and named entities. Furthermore, examples of relevant documents can narrow down the collection scope.…”
Section: Event-centric Collection Specificationmentioning
confidence: 99%
“…The problem of knowing what to collect from the web has also been treated in the digital library research community as a focused crawling problem. In focused crawling the goal is to collect content about particular topics (Risse et al, 2012), events (Klein, Balakireva, & Van de Sompel, 2018;Yang, Chitturi, Wilson, Magdy, & Fox, 2012 ), or to collect content that has a particular characteristic such as popularity (Page, Brin, Motwani, & Winograd, 1999), importance Baeza-Yates, Marin, Castillo, & Rodriguez (2005)] or social engagement (Gossen, Demidova, & Risse, 2015 ;Milligan, Ruest, & Lin, 2016;Nwala, Weigle, & Nelson, 2018 ). Generally speaking these approaches take the focus to be a topic, event, person, organization that can be qualified by the types of media (documents, audio, video).…”
Section: Digital Librariesmentioning
confidence: 99%