2019
DOI: 10.4018/ijwp.2019070103
|View full text |Cite
|
Sign up to set email alerts
|

Personalized Content Extraction and Text Classification Using Effective Web Scraping Techniques

Abstract: Web scraping is a technique to extract information from various web documents automatically. It retrieves the related contents based on the query, aggregates and transforms the data from an unstructured format into a structured representation. Text classification becomes a vital phase to summarize the data and in categorizing the webpages adequately. In this article, using effective web scraping methodologies, the data is initially extracted from websites, then transformed into a structured form. Based on the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0
4

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 69 publications
(25 citation statements)
references
References 17 publications
0
21
0
4
Order By: Relevance
“…By applying data scraping methods (Karthikeyan, Sekaran, Ranjith, & Balajee, 2019), the authors of the article created a unique tool of article collection by using R language. Articles were collected from the news portals by using keywords 'Visaginas city,' 'Ignalina NPP,' 'Astravyets NPP,' 'Chernobyl.'…”
Section: Methodsmentioning
confidence: 99%
“…By applying data scraping methods (Karthikeyan, Sekaran, Ranjith, & Balajee, 2019), the authors of the article created a unique tool of article collection by using R language. Articles were collected from the news portals by using keywords 'Visaginas city,' 'Ignalina NPP,' 'Astravyets NPP,' 'Chernobyl.'…”
Section: Methodsmentioning
confidence: 99%
“…Web Scraping adalah teknik untuk mengekstrak informasi dari berbagai dokumen web secara otomatis. Web Scraping mengambil konten terkait berdasarkan kueri, menggabungkan dan mengubah data dari format tidak terstruktur menjadi representasi terstruktur (Karthikeyan et al, 2019). Framework TI yang diambil meliputi: ITIL, ISO 27001, dan COBIT yang diklasifikasikan menjadi title dan keyword pada tiap framework.…”
Section: Hasil Dan Pembahasan Hasil Pengumpulan Dataunclassified
“…Execution of the framework is affecting in terms of exactness or computation time. In a number of methodologies, computation is more and it cannot apply inside the real-time environment [20] [21].…”
Section: Related Workmentioning
confidence: 99%