Search engine is critical in people’s daily life because it determines the information quality people obtain through searching. Fierce competition for the ranking in search engines is not conducive to both users and search engines. Existing research mainly studies the content and links of websites. However, none of these techniques focused on semantic analysis of link and anchor text for detection. In this paper, we propose a web spam detection method by extracting novel feature sets from the homepage source code and choosing the random forest (RF) as the classifier. The novel feature sets are extracted from the homepage’s links, hypertext markup language (HTML) structure, and semantic similarity of content. We conduct experiments on the WEBSPAM-UK2007 and UK-2011 dataset using a five-fold cross-validation method. Besides, we design three sets of experiments to evaluate the performance of the proposed method. The proposed method with novel feature sets is compared with different indicators and has better performance than other methods with a precision of 0.929 and a recall of 0.930. Experiment results show that the proposed model could effectively detect web spam.
Generally, the imaging quality of Fourier single-pixel imaging (FSI) will severely degrade while achieving high-speed imaging at a low sampling rate (SR). To tackle this problem, a new, to the best of our knowledge, imaging technique is proposed: firstly, the Hessian-based norm constraint is introduced to deal with the staircase effect caused by the low SR and total variation regularization; secondly, based on the local similarity prior of consecutive frames in the time dimension, we designed the temporal local image low-rank constraint for the FSI, and combined the spatiotemporal random sampling method, the redundancy image information of consecutive frames can be utilized sufficiently; finally, by introducing additional variables to decompose the optimization problem into multiple sub-problems and analytically solving each one, a closed-form algorithm is derived for efficient image reconstruction. Experimental results show that the proposed method improves imaging quality significantly compared with state-of-the-art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.