Qingzhao Tan scite author profile

Qingzhao Tan

4Publications

108Citation Statements Received

60Citation Statements Given

How they've been cited

115

107

How they cite others

Affiliations

Ministry of Industry and Information Technology, Pennsylvania State University, Beijing Institute of Technology

Publications

Order By: Most citations

Applying Co-training to Clickthrough Data for Search Engine Adaptation

Tan

Chai

et al. 2004

View full text Add to dashboard Cite

Abstract. The information on the World Wide Web is growing without bound. Users may have very diversified preferences in the pages they target through a search engine. It is therefore a challenging task to adapt a search engine to suit the needs of a particular community of users who share similar interests. In this paper, we propose a new algorithm, Ranking SVM in a Co-training Framework (RSCF). Essentially, the RSCF algorithm takes the clickthrough data containing the items in the search result that have been clicked on by a user as an input, and generates adaptive rankers as an output. By analyzing the clickthrough data, RSCF first categorizes the data as the labelled data set, which contains the items that have been scanned already, and the unlabelled data set, which contains the items that have not yet been scanned. The labelled data is then augmented with unlabelled data to obtain a larger data set for training the rankers. We demonstrate that the RSCF algorithm produces better ranking results than the standard Ranking SVM algorithm. Based on RSCF we develop a metasearch engine that comprises MSNSearch, Wisenut, and Overture, and carry out an online experiment to show that our metasearch engine outperforms Google.

show abstract

Extraction and search of chemical formulae in text documents on the web

Sun

Tan

Mitra

et al. 2007

View full text Add to dashboard Cite

Clustering-based incremental web crawling

Tan

Mitra

2010

ACM Trans. Inf. Syst.

View full text Add to dashboard Cite

When crawling resources, e.g., number of machines, crawl-time, etc., are limited, a crawler has to decide an optimal order in which to crawl and re-crawl webpages. Ideally, crawlers should request only those webpages that have changed since the last crawl; in practice, a crawler may not know whether a webpage has changed before downloading it. In this paper, we identify features of webpages that are correlated to their change frequency. We design a crawling algorithm that clusters webpages based on features that correlate to their change frequencies obtained by examining past history. The crawler downloads a sample of webpages from each cluster and depending upon whether a significant number of these webpages have changed in the last crawl cycle, it decides to recrawl the entire cluster. To evaluate the performance of our incremental crawler, we develop an evaluation framework that measures which crawling policy results in the best search results for the end-user. We run experiments on a real Web data set of about 300,000 distinct URLs distributed among 210 websites. The results demonstrate that the clustering-based sampling algorithm effectively clusters the pages with similar change patterns, and our clusteringbased crawling algorithm outperforms existing algorithms in that it can improve the quality of the user experience for those who query the search engine.

show abstract

Efficient progressive processing of skyline queries in peer-to-peer systems

Tan

Lee

2006

View full text Add to dashboard Cite

Abstract-Skyline queries have received a lot of attention from database and information retrieval research communities. A skyline query returns a set of data objects that is not dominated by any other data objects in a given dataset. However, most of existing studies focus on skyline query processing in centralized systems. Only recently, skyline queries are considered in a distributed computing environment. Acknowledging the trend toward peer-to-peer (P2P) systems in distributed computing, we examine the problem of skyline query processing in P2P systems and propose innovative solutions. We exploit the data semantic embedded in semantically structured P2P overlay networks to efficiently prune search space, without compromising the quality of query result. In addition, we propose approximate algorithms to support skyline queries where exact answers are too costly to obtain. These approximate algorithms produce high quality answers using heuristics based on local semantics of peer nodes. Extensive experiments validate that our algorithms provides high efficiency and scalability to skyline query processing in P2P systems.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Qingzhao Tan

Applying Co-training to Clickthrough Data for Search Engine Adaptation

Extraction and search of chemical formulae in text documents on the web

Clustering-based incremental web crawling

Efficient progressive processing of skyline queries in peer-to-peer systems

Contact Info

Product

Resources

About