2021
DOI: 10.1109/access.2021.3086586
|View full text |Cite
|
Sign up to set email alerts
|

High-Quality Train Data Generation for Deep Learning-Based Web Page Classification Models

Abstract: The current deep learning models detecting relevant web pages show low accuracy because of the poor quality of the training data. In this paper, we propose a novel algorithm to automatically generate high-quality training data based on the frequency of the document including the entity of interest. Our experimental results with movies and cellphones data sets show that the average F 1 -score of the deep learning models (FNN, CNN, Bi-LSTM, and SeqGAN) trained with our proposed algorithm shows up to 0.9992 in F … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 27 publications
(32 reference statements)
0
1
0
Order By: Relevance
“…An algorithm to automatically generate high-quality training data-based on the frequency of the document including the entity of interest is proposed in Ref. [ 22 ]. Also in Human Activity Recognition, automatic labelling has been applied.…”
Section: Introductionmentioning
confidence: 99%
“…An algorithm to automatically generate high-quality training data-based on the frequency of the document including the entity of interest is proposed in Ref. [ 22 ]. Also in Human Activity Recognition, automatic labelling has been applied.…”
Section: Introductionmentioning
confidence: 99%