Automatic Extraction of Web Page Text Information Based on Network Topology Coincidence Degree

Shu, Zhinian; Li, Xiaorong

doi:10.1155/2022/9220661

Cited by 6 publications

(8 citation statements)

References 37 publications

(47 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…According to Diffusion-based algorithms, their results are more trustworthy and less vulnerable to the type of graphs used. Reference [15] An automatic extraction approach based on network topology coincidence degree is proposed to successfully overcome the above concerns. To classify web text content, a search engine, a web crawler, and a hypertext tag are utilized, followed by dimensionality reduction.…”

Section: Literature Surveymentioning

confidence: 99%

Weighted PageRank Algorithm Search Engine Ranking Model for Web Pages

Shaffi¹,

Muthulakshmi²

2023

Intelligent Automation &Amp; Soft Computing

View full text Add to dashboard Cite

As data grows in size, search engines face new challenges in extracting more relevant content for users' searches. As a result, a number of retrieval and ranking algorithms have been employed to ensure that the results are relevant to the user's requirements. Unfortunately, most existing indexes and ranking algorithms crawl documents and web pages based on a limited set of criteria designed to meet user expectations, making it impossible to deliver exceptionally accurate results. As a result, this study investigates and analyses how search engines work, as well as the elements that contribute to higher ranks. This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank (PR) algorithm, which is one of the most widely used page ranking algorithms We propose weighted PageRank (WPR) algorithms to test the relationship between these various measures. The Weighted Page Rank (WPR) model was used in three distinct trials to compare the rankings of documents and pages based on one or more user preferences criteria. The findings of utilizing the Weighted Page Rank model showed that using multiple criteria to rank final pages is better than using only one, and that some criteria had a greater impact on ranking results than others.

show abstract

Section: Literature Surveymentioning

confidence: 99%

Weighted PageRank Algorithm Search Engine Ranking Model for Web Pages

Shaffi¹,

Muthulakshmi²

2023

Intelligent Automation &Amp; Soft Computing

View full text Add to dashboard Cite

show abstract

“…Zhinian Shu & Xiaorong Li [18] implemented an automatic extraction method of web text information based on network topology coincidence degree. Web crawler, hypertext tag, and search engine were utilized for web text information classification, and the reduction of dimensionality was carried out.…”

Section: Literature Surveymentioning

confidence: 99%

“…And in the similar way, proposed method also synthesized dataset for the comparison. Table 4 shows that the comparative analysis with the existing methods of WCPK [17], Automatic extraction method [18], Malicious website detection technique [19], and BERT, SoftMax [20].…”

Section: Comparative Analysismentioning

confidence: 99%

Text Matching Technique-based Intelligent Web Crawler in Hybrid Mode

2023

IJIES

View full text Add to dashboard Cite

Web crawlers gather and analyze a large amount of data available online to obtain specific forms of objective data, such as news. Web crawlers are becoming more important since big data is used in numerous different sectors and web data is rising dramatically each year. However, when analyzing large volumes of information and making rapid decisions, the organization frequently uses minimal data, which leads to inefficient choices. In this paper, the minibatch stochastic gradient descent (SGD) optimization and radial basis function SVM are proposed to assist organizations in the targeted crawling of relevant online artifacts and semantically matching them against internal big data for better strategy decisions. The proposed method has been used and extensively evaluated in the e-procurement field. The minibatch SGD optimization and radial basis function SVM has gradually been expanded to include more fields such as robot programming and cloud hosting. The existing methods of web crawler for pharmacokinetics (WCPK), automatic extraction method, malicious website detection techniques, and BERT with softmax layer method are used to justify the effectiveness of the proposed minibatch SGD optimization and the radial basis function SVM method. The proposed method achieves better precision, recall, and f1-measure of 99.25%, 98.91%, and 99.57% on DMOZ dataset and 96.23%, 94.71%, and 97.53% on synthesized dataset when compared to the existing methods.

show abstract

“…Using this method to retrieve the web portal information can avoid the limitation of slow convergence caused by too much complex data; thus, improving the retrieval speed of a computer processing cloud data. e reason is that this method can construct the decision tree of information retrieval quickly and reduce the retrieval time to a certain extent by using the dynamic information as the node of the decision tree [35,36].…”

Section: Experimental Analysismentioning

confidence: 99%

Fast Retrieval Method of Portal Information Based on a Chaotic Genetic Algorithm

Zhao

Tai

2022

Mathematical Problems in Engineering

View full text Add to dashboard Cite

The traditional retrieval method cannot respond to the influence of the change in the portal website’s information characteristics, resulting in low efficiency. In this regard, a fast information retrieval method based on a chaotic genetic algorithm is proposed. According to the relevant theory of association rules, the correlation between information data of dynamic portal websites is calculated; different portal website information is retrieved based on the Markov model output; a chaotic genetic algorithm is used to fuse different portal website information. The information data constructs a decision tree for rapid retrieval of portal information, uses the vector form to express the characteristics of portal information, and finally realizes the rapid retrieval of portal information. The experimental results show that the designed method takes up to 15 ms when the sample complexity is high, which shows that the designed method has high efficiency and is of great significance in practical applications.

show abstract

Automatic Extraction of Web Page Text Information Based on Network Topology Coincidence Degree

Cited by 6 publications

References 37 publications

Weighted PageRank Algorithm Search Engine Ranking Model for Web Pages

Weighted PageRank Algorithm Search Engine Ranking Model for Web Pages

Text Matching Technique-based Intelligent Web Crawler in Hybrid Mode

Fast Retrieval Method of Portal Information Based on a Chaotic Genetic Algorithm

Contact Info

Product

Resources

About