2021
DOI: 10.1587/transinf.2020ntp0001
|View full text |Cite
|
Sign up to set email alerts
|

HAIF: A Hierarchical Attention-Based Model of Filtering Invalid Webpage

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
0
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 20 publications
0
0
0
Order By: Relevance
“…We used a convolutional neural network (CNN), the state-of-the-art tool in computer vision, to detect error Web pages and then separate them into a dedicated folder, all automatically. In this sense, Web pages that do not contain useful information are called "ERROR" Web pages, whereas Web pages that contain valuable information are called "VALID" Web pages [5]. Here, we present an automatic detection of error Web pages based exclusively on their webshots.…”
Section: Use Case 2: Automatic Recognition Of Error Web Pagesmentioning
confidence: 99%
See 1 more Smart Citation
“…We used a convolutional neural network (CNN), the state-of-the-art tool in computer vision, to detect error Web pages and then separate them into a dedicated folder, all automatically. In this sense, Web pages that do not contain useful information are called "ERROR" Web pages, whereas Web pages that contain valuable information are called "VALID" Web pages [5]. Here, we present an automatic detection of error Web pages based exclusively on their webshots.…”
Section: Use Case 2: Automatic Recognition Of Error Web Pagesmentioning
confidence: 99%
“…Te efective management of such a quantity and variety of information is an increasingly difcult task for traditional techniques. For example, organizing valid content and fltering invalid content (purifying the Internet) are challenges facing the current Web [5]. Te immense presence on the Internet of error Web pages (e.g., under construction, maintenance, domain ofer, suspended account, page not found, browser incompatibility, virus, phishing, or service failure), which continue to be indexed and returned by search engines, afecting webmasters and users in general.…”
Section: Introductionmentioning
confidence: 99%