2020
DOI: 10.3906/elk-2004-67
|View full text |Cite
|
Sign up to set email alerts
|

A regular expression generator based on CSS selectors for efficient extractionfrom HTML pages

Abstract: Cascading Style Sheets (CSS) selectors are patterns used to select HTML elements. They are often preferred in web data extraction because they are easy to prepare and have short expressions. In order to be able to extract data from web pages by using these patterns, a Document Object Model (DOM) tree is constructed by an HTML parser for a web page. The construction process of this tree and the extraction process using this tree increase time and memory costs depending on the number of HTML elements and their h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0
2

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 14 publications
0
2
0
2
Order By: Relevance
“…Metode lain yang dapat digunakan untuk mengambil data secara otomatis atau disebut dengan automated web scraping di antaranya HTML Parsing, Regex, DOM Parsing, dan Xpath [8]. Metode ini dilakukan dengan pengambilan data secara otomatis dengan cara menemukan pola ekstraksi dari satu atau banyak halaman web yang diinginkan [9]. Hal ini tentunya dapat meminimalisir waktu yang dibutuhkan dalam pengambilan data.…”
Section: Pendahuluanunclassified
See 1 more Smart Citation
“…Metode lain yang dapat digunakan untuk mengambil data secara otomatis atau disebut dengan automated web scraping di antaranya HTML Parsing, Regex, DOM Parsing, dan Xpath [8]. Metode ini dilakukan dengan pengambilan data secara otomatis dengan cara menemukan pola ekstraksi dari satu atau banyak halaman web yang diinginkan [9]. Hal ini tentunya dapat meminimalisir waktu yang dibutuhkan dalam pengambilan data.…”
Section: Pendahuluanunclassified
“…Metode ini melibatkan pemilihan elemen berdasarkan kelas CSS, id, nama tag, dan atribut lainnya. CSS Selector memiliki pola yang singkat dan mudah untuk ditulis [9]. Pseudo code untuk pengambilan data web scraping pada penelitian ini ditunjukkan pada Gambar 3.…”
Section: Seleniumunclassified
“…That is, to give the presentation style and user interface design of the web page. According to the author [25], cascading style sheet selectors are patterns that are used to select HTML elements. In addition, CSS technology is useful to adopt font size in a new responsive web suitable for different screen sizes, including tiny devices [26].…”
Section: Bootstrapmentioning
confidence: 99%
“…Cascading Style Sheets (CSS for short) is a style sheet language that gives HTML-based web pages a more presentable appearance. In addition, it is a way to make a web page its unique look and feel by modifying things like font, size, and color (Uzun, 2020). In this particular research project, this programming language was utilized to design the overall appearance of the front-end interfaces.…”
Section: (V) Cascading Style Sheetsmentioning
confidence: 99%