2021
DOI: 10.1007/s40747-021-00471-1
|View full text |Cite|
|
Sign up to set email alerts
|

IHWC: intelligent hidden web crawler for harvesting data in urban domains

Abstract: Due to the massive size of the hidden web, searching, retrieving and mining rich and high-quality data can be a daunting task. Moreover, with the presence of forms, data cannot be accessed easily. Forms are dynamic, heterogeneous and spread over trillions of web pages. Significant efforts have addressed the problem of tapping into the hidden web to integrate and mine rich data. Effective techniques, as well as application in special cases, are required to be explored to achieve an effective harvest rate. One s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 33 publications
(40 reference statements)
0
2
0
Order By: Relevance
“…In comparison to the usual design, the implemented technique crawls, allowing for more equitable data distribution through plug-and-plug. When compared with the existing methods such as IHCM [16], SVM [17], NTPDCF [18], and SIMHAR [21], the implemented hamming distance method achieves better performance values of 99.8%, 99.9%, 98%, and 99% in terms of accuracy, precision, recall, and f-measure.…”
Section: Discussionmentioning
confidence: 97%
See 2 more Smart Citations
“…In comparison to the usual design, the implemented technique crawls, allowing for more equitable data distribution through plug-and-plug. When compared with the existing methods such as IHCM [16], SVM [17], NTPDCF [18], and SIMHAR [21], the implemented hamming distance method achieves better performance values of 99.8%, 99.9%, 98%, and 99% in terms of accuracy, precision, recall, and f-measure.…”
Section: Discussionmentioning
confidence: 97%
“…This section provides a discussion of the implemented hamming distance and compares those results with existing methods such as IHCM [16], SVM [17], NTPDCF [18], and SIMHAR [21] in the comparative analysis section. The main goal of the hamming distance is to avoid becoming the bottleneck in the pipeline, these algorithms must be fast and efficient.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Kaur [14] implemented an intelligent hidden web crawler (IHWC) method to solve the relevant issues such as domain classification, prioritizing URLs, and avoiding exhaustive searching. Using rejection rules, the crawler selects appropriate web pages and ignores the insignificant ones.…”
Section: Literature Surveymentioning
confidence: 99%
“…erefore, the direction of the theme crawler is based on the simple crawler, crawling more targeted content, crawling more accurate information in a particular field, for example, using a web crawler to crawl agricultural and financial information, etc. [9][10][11]. In a specific program operation, the topic web crawler will receive program instructions related to the topic before crawling the data.…”
Section: Data Collection Designmentioning
confidence: 99%