2021
DOI: 10.12928/telkomnika.v19i1.16205
|View full text |Cite
|
Sign up to set email alerts
|

WEIDJ: Development of a new algorithm for semi-structured web data extraction

Abstract: In the era of industrial digitalization, people are increasingly investing in solutions that allow their process for data collection, data analysis and performance improvement. In this paper, advancing web scale knowledge extraction and alignment by integrating few sources by exploring different methods of aggregation and attention is considered in order focusing on image information. The main aim of data extraction with regards to semistructured data is to retrieve beneficial information from the web. The dat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…This method is useful in handling the structure of data, whether it is structured, semi-structured or unstructured. The second part is related to the knowledge based 1 shows general models for three web data extraction models; DOM [23], WHDJ [24] and WEIDJ [25]. In addition to the basic capabilities of WEIDJ, our extractor also provides several other useful and user's friendly features.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…This method is useful in handling the structure of data, whether it is structured, semi-structured or unstructured. The second part is related to the knowledge based 1 shows general models for three web data extraction models; DOM [23], WHDJ [24] and WEIDJ [25]. In addition to the basic capabilities of WEIDJ, our extractor also provides several other useful and user's friendly features.…”
Section: Methodsmentioning
confidence: 99%
“…The image extraction has been extracted in three ways: a) The extraction of images in general way b) The extraction of images by considering the size of images in two parts; 50*50 pixels and 128*128 pixels. c) The extraction of images is tested randomly at different levels; 5 pages, 10 pages, 15 pages, 20 pages, 25 pages and 30 pages.…”
Section: Methodsmentioning
confidence: 99%
“…International Data Corporation stated that unstructured data would make up 95% of all data worldwide in 2020, with a compound annual growth rate of 65% [30]. Due to the quality and usability concerns with large unstructured datasets, structured data are more relevant and valuable than unstructured or semi-structured data [31]. It has been stated that all prospective big data solutions are hampered by the unstructured nature of data, which have no schema, many formats, originates from various sources, and lacks standards [32].…”
Section: Related Workmentioning
confidence: 99%
“…However, in some cases, these data extraction tools and methods do not offer data transformation to convert the data to a structured form. Based on this, efforts are needed to deal with this problem and convert the unstructured form of data into more meaningful and structured information [10,31,39]. The development for automating the extraction of semi-structured web data is much needed.…”
Section: Related Workmentioning
confidence: 99%