Generating Queries to Crawl Hidden Web Using Keyword Sampling and Random Forest Classifier

Rohatgi, Shwetanshu; Kundu, Sabarni

doi:10.26483/ijarcs.v8i9.4936

Cited by 1 publication

(1 citation statement)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Kundu and Rohatgi used an approach to generate potential input queries to web forms to uncover hidden web representations [57]. Their approach generated the input queries through web page clustering and sampling using TF/IDF 10 and a random forest classifier to construct the input text.…”

Section: Crawling the Deep Webmentioning

confidence: 99%

Exploring the Intersections of Web Science and Accessibility

Bostic

Stanley

Higgins

et al. 2019

Human Systems Engineering and Design II

View full text Add to dashboard Cite

The web is the prominent way information is exchanged in the 21st century. However, ensuring web-based information is accessible is complicated, particularly with web applications that rely on JavaScript and other technologies to deliver and build representations; representations are often the HTML, images, or other code a server delivers for a web resource. Static representations are becoming rarer and assessing the accessibility of web-based information to ensure it is available to all users is increasingly difficult given the dynamic nature of representations.In this work, we survey three ongoing research threads that can inform web accessibility solutions: assessing web accessibility, modeling web user activity, and web application crawling. Current web accessibility research is continually focused on increasing the percentage of automatically testable standards, but still relies heavily upon manual testing for complex interactive applications. Along-side web accessibility research, there are mechanisms developed by researchers that replicate user interactions with web pages based on usage patterns. Crawling web applications is a broad research domain; exposing content in web applications is difficult because of incompatibilities in web crawlers and the technologies used to create the applications. We describe research on crawling the deep web by exercising user forms. We close with a thought exercise regarding the convergence of these three threads and the future of automated, web-based accessibility evaluation and assurance through a use case in web archiving. These research efforts provide insight into how users interact with websites, how to automate and simulate user interactions, how to record the results of user interactions, and how to analyze, evaluate, and map resulting website content to determine its relative accessibility.

show abstract

Section: Crawling the Deep Webmentioning

confidence: 99%