2020
DOI: 10.1631/fitee.1900240
|View full text |Cite
|
Sign up to set email alerts
|

Web page classification based on heterogeneous features and a combination of multiple classifiers

Abstract: Precise web page classification can be achieved by evaluating features of web pages, and the structural features of web pages are effective complements to their textual features. Various classifiers have different characteristics, and multiple classifiers can be combined to allow classifiers to complement one another. In this study, a web page classification method based on heterogeneous features and a combination of multiple classifiers is proposed. Different from computing the frequency of HTML tags, we expl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 19 publications
0
8
0
Order By: Relevance
“… seven studies that used a combination of HTML tag structure and text content, as shown in [10], [11], [19], [20], [24], [25], [27]  six studies that used images as shown in [9], [12]- [14], [16], [17]  three studies that used the feature of HTML tags structure as shown in [8], [15], [28]  two studies that used each feature of text content as shown in [22], [23]  two studies that used URL features as shown in [21],…”
Section: Resultsmentioning
confidence: 99%
“… seven studies that used a combination of HTML tag structure and text content, as shown in [10], [11], [19], [20], [24], [25], [27]  six studies that used images as shown in [9], [12]- [14], [16], [17]  three studies that used the feature of HTML tags structure as shown in [8], [15], [28]  two studies that used each feature of text content as shown in [22], [23]  two studies that used URL features as shown in [21],…”
Section: Resultsmentioning
confidence: 99%
“…The task of multi-label website classification has received little research attention; however, we focused on it as it would help us to establish our methodology. In the multilabel website classification context, Deng and Shen [49] approached the methodology by combining deep learning and machine learning to benefit from the ability to extract high-level features from a large amount of raw data and the ability to process high-dimensional features provided by LSTM and SVM, respectively. The proposed method was better than using SVM or LSTM independently in terms of accuracy.…”
Section: B: Multi-label Website Classificationmentioning
confidence: 99%
“…In a similar spirit to our work, researchers in the past also explored the effectiveness of visual features in classifying Web pages (de Boer et al 2010). More recently, re- searchers have started to explore deep architectures based on LSTM (Deng, Du, and Shen 2020), GRU (Du, Han, and Zhao 2018), and BERT (Gupta and Bhatia 2021) applied to textual and HTML features. Although, unlike us, focusing only on a single language or a subset of very popular websites, these directions have shown how performance can be improved with the help of more complex models.…”
Section: Related Workmentioning
confidence: 99%