Umara Noor scite author profile

To devise vision of the next generation of the web, deep web technologies have gained larger attention in a last few years. An eminent feature of next generation of web is the automation of tasks. A large part of Deep web comprises of online structured domain specific databases that are accessed using web query interfaces. The information contained in these databases is related to a particular domain. This highly relevant information is more suitable for satisfying the information needs of the users and large scale deep web integration. In order to make this extraction and integration process easier, it is necessary to classify the deep web databases into standard\ non-standard category domains. There are mainly two types of classification techniques i.e. manual and automatic. As the size of deep web is increasing at an exponential rate with the passage of time, it has become nearly impossible to classify these deep web search sources manually into their respective domains. For this purpose, several automatic deep web classification techniques have been proposed in the literature. In this paper apart from the literature survey, we propose a framework for analysis of automatic classification techniques of deep web. The framework provides a baseline for the analysis of rudiments of automatic classification techniques based on the parameters such as structured, unstructured, simple/advance query forms, content representative extraction methodology, level of classification, performance evaluation criteria and its results. Furthermore, we studied a number of automatic deep web classification techniques in the light of proposed framework.

show abstract

Economic model for evaluating the value creation through information sharing within the cybersecurity information sharing ecosystem

Rashid

Noor

Altmann

2021

Future Generation Computer Systems

View full text Add to dashboard Cite

Latent Dirichlet Allocation Based Semantic Clustering of Heterogeneous Deep Web Sources

Noor

Daud

Manzoor

2013

View full text Add to dashboard Cite

Over the years a critical increase in the mass of the web has been observed. Among that a large part comprises of online subject-specific databases, hidden behind query interface forms known as deep web. Existing search engines are unable to completely index this highly relevant information due to its large volume. To access deep web content, the research community has proposed to organize it using machine learning techniques. Clustering is one of the key solutions to organize the deep web databases. Existing clustering methods do not encounter semantic relevance among deep web forms. In this paper, we propose a novel method DWSemClust to cluster deep web databases based on the semantic relevance found among deep web forms by employing a generative probabilistic model Latent Dirichlet Allocation (LDA) for modeling content representative of deep web databases. A document comprises of multiple topics, the task of LDA is to cluster words present in the document into "topics". The purpose of the parameter estimation process in the underlying model is to discover the document's topic and tell about its proportionate distribution in documents. Deep web has a sparse topic distribution. Due to this reason we have proposed to use LDA that is supposed to be a good clustering solution for the sparse distribution of topics. Further we employ a rich set of metadata as our content representative that comprises of form contents (single attribute/ multiple attributes) and page contents. Experimental results show that our proposed method clearly outperforms the existing non-semantics based clustering methods.

show abstract

Customer-oriented ranking of cyber threat intelligence service providers

Noor

Anwar

Altmann

et al. 2020

Electronic Commerce Research and Applications

View full text Add to dashboard Cite

Network Externalities in Cybersecurity Information Sharing Ecosystems

Rashid

Noor

Altmann

2019

View full text Add to dashboard Cite

A survey revealing path towards service life cycle management in COBIT 5

Noor

Anila

2016

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Umara Noor

A machine learning-based FinTech cyber threat attribution framework using high-level indicators of compromise

A machine learning framework for investigating data breaches based on semantic analysis of adversary’s attack patterns in threat intelligence repositories

A Survey of Automatic Deep Web Classification Techniques

Economic model for evaluating the value creation through information sharing within the cybersecurity information sharing ecosystem

Latent Dirichlet Allocation Based Semantic Clustering of Heterogeneous Deep Web Sources

Customer-oriented ranking of cyber threat intelligence service providers

Network Externalities in Cybersecurity Information Sharing Ecosystems

A survey revealing path towards service life cycle management in COBIT 5

Contact Info

Product

Resources

About