Pavan Kumar scite author profile

Pavan Kumar

5Publications

10Citation Statements Received

11Citation Statements Given

How they've been cited

How they cite others

Affiliations

Dhanalakshmi Srinivasan Group of Institutions, QNu Labs (India)

Publications

Order By: Most citations

URL normalization for de-duplication of web pages

Agarwal¹,

Koppula²,

Leela³

et al. 2009

View full text Add to dashboard Cite

Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we present a set of techniques to mine rules from URLs and utilize these learnt rules for de-duplication using just URL strings without fetching the content explicitly. Our technique is composed of mining the crawl logs and utilizing clusters of similar pages to extract specific rules from URLs belonging to each cluster. Preserving each mined rules for de-duplication is not efficient due to the large number of specific rules. We present a machine learning technique to generalize the set of rules, which reduces the resource footprint to be usable at web-scale. The rule extraction techniques are robust against web-site specific URL conventions. We demonstrate the effectiveness of our techniques through experimental evaluation.

show abstract

Learning website hierarchies for keyword enrichment in contextual advertising

Kumar

Leela

Parsana

et al. 2011

View full text Add to dashboard Cite

A Fuzzy Reliability Model For The Effect Of Corticosterone Based On Two Parameter Distribution

Kumar

Venkatesh²

2017

View full text Add to dashboard Cite

Ontology-based effective information retrieval from the web using concept aware user profile construction

Kumar¹,

Samath²,

Iqbal

2018

IJENM

View full text Add to dashboard Cite

Relevance-index size tradeoff in contextual advertising

Kumar

Leela

Parsana

et al. 2010

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Pavan Kumar

URL normalization for de-duplication of web pages

Learning website hierarchies for keyword enrichment in contextual advertising

A Fuzzy Reliability Model For The Effect Of Corticosterone Based On Two Parameter Distribution

Ontology-based effective information retrieval from the web using concept aware user profile construction

Relevance-index size tradeoff in contextual advertising

Contact Info

Product

Resources

About