Abstract--Web Services are emerging technologies that enable application to application communication and reuse of services over Web. Semantic Web improves the quality of existing tasks, including Web services discovery, invocation, composition, monitoring, and recovery through describing Web services capabilities and content in a computer interpretable language. To provide most of the requested Web services, a Web service matchmaker is usually required. Web service matchmaking is the process of finding an appropriate provider for a requester through a middle agent. To provide the right service for the right user request, Quality of service (QoS)-based Web service selection is widely used. Employing QoS in Web service selection helps to satisfy user requirements through discovering the best service(s) in terms of the required QoS. Inspired by the mode of the Internet Web search engine, like Yahoo, Google, in this paper we provide a QoS-based service selection algorithm that is able to identify the best candidate semantic Web service(s) given the description of the requested service(s) and QoS criteria of user requirements. In addition, our proposed approach proposes a ranking method for those services. We also show how we employ data warehousing techniques to model the service selection problem. The proposed algorithm integrates traditional match making mechanism with data warehousing techniques. This integration of methodologies enables us to employ the historical preference of the user to provide better selection in future searches. The main result of the paper is a generic framework that is implemented to demonstrate the feasibility of the proposed algorithm for QoSbased Web application. Our presented experimental results show that the algorithm indeed performs well and increases the system reliability.
Egypt has the largest and most significant higher education system in the Middle East and North Africa but it had been continuously facing serious and accumulated challenges.The gap between what is existing and what is supposed to be for the self-regulation and improvement processes is not entirely clear to face these challenges. The effective use of information technology in higher education requires good and new techniques as well as rational strategies. The reform of higher education through strategies based on data analysis of current situation will affect the overall performance of transitional state and will shape new paradigms in higher education system development in Egypt. This research has objective to develop a model of Composite index (CI) based on a set of key performance indicators (KPI) commensurate with the nature of higher education institutions inEgypt. The outcomes of the composite index aim to measure overall performance of institutions and provide unified ranking method in this context. KPIs are determined as description of key success factors related to institutions sustainability. These KPIs are classified into main areas and sub-indicators. Within this scope, the indicators were weighted via Analytic Hierarchy Process (AHP) method according to their significance levels. Pairwise comparison survey template and database web application were developed to collect narrative responses, apply algorithm and extract results. The research study was conducted with 40 professors from 19 renowned universities in Egypt as education experts. The status of composite index model implementation is discussed from theoretical and technical perspectives.
Most of the current work on skyline queries mainly dealt with querying static query points over static data sets. With the advances in wireless communication, mobile computing, and positioning technologies, it has become possible to obtain and manage (model, index, query, etc.) the trajectories of moving objects in real life, and consequently the need for continuous skyline query processing has become more and more pressing. In this paper, we address the problem of efficiently maintaining continuous skyline queries which contain both static and dynamic attributes. We present a Multi-level Continuous Skyline Query (MCSQ) algorithm, which basically creates a pre-computed skyline data set, facilitates skyline update, and enhances query running time and performance. Our algorithm in brief proceeds as follows: First, we distinguish the data points that are permanently in the skyline and use them to derive a search bound. Second, we establish a pre-computed data set for dynamic skyline that depends on the number of skyline levels (M) which is later used to update the first (initial) skyline points. Finally, every time the skyline needs to be updated we use the pre-computed data sets of skyline to update the previous skyline set and consequently updating first skyline. Finally, we present experimental results to demonstrate the performance and efficiency of our algorithm.
Entity Resolution (ER) is defined as the process 0f identifying records/ objects that correspond to real-world objects/ entities. To define a good ER approach, the schema of the data should be well-known. In addition, schema alignment of multiple datasets is not an easy task and may require either domain expert or ML algorithm to select which attributes to match. Schema agnostic meta-blocking tries to solve such a problem by considering each token as a blocking key regardless of the attributes it appears in. It may also be coupled with meta-blocking to reduce the number of false negatives. However, it requires the exact match of tokens which is very hard to occur in the actual datasets and it results in very low precision. To overcome such issues, we propose a novel and efficient ER approach for big data implemented in Apache Spark. The proposed approach is employed to avoid schema alignment as it treats the attributes as a bag of words and generates a set of n-grams which is transformed to vectors. The generated vectors are compared using a chosen similarity measure. The proposed approach is a generic one as it can accept all types of datasets. It consists of five consecutive sub-modules: 1) Dataset acquisition, 2) Dataset pre-processing, 3) Setting selection criteria, where all settings of the proposed approach are selected such as the used blocking key, the significant attributes, NLP techniques, ER threshold, and the used scenario of ER, 4) ER pipeline construction, and 5) Clustering where the similar records are grouped into the similar cluster. The ER pipeline could accept two types of attributes; the Weighted Attributes (WA) or the Compound Attributes (CA). In addition, it accepts all the settings selected in the fourth module. The pipeline consists of five phases. Phase 1) Generating the tokens composing the attributes. Phase 2) Generating n-grams of length n. Phase 3) Applying the hashing Text Frequency (TF) to convert each n-grams to a fixed-length feature vector. Phase 4) Applying Locality Sensitive Hashing (LSH), which maps similar input items to the same buckets with a higher probability than dissimilar input items. Phase 5) Classification of the objects to duplicates or not according to the calculated similarity between them. We introduced seven different scenarios as an input to the ER pipeline. To minimize the number of comparisons, we proposed the length filter which greatly contributes to improving the effectiveness of the proposed approach as it achieves the highest F-measure between the existing computational resources and scales well with the available working nodes. Three results have been revealed: 1) Using the CA in the different scenarios achieves better results than the single WA in terms of efficiency and effectiveness. 2) Scenario 3 and 4 Achieve the best performance time because using Soundex and Stemming contribute to reducing the performance time of the proposed approach. 3) Scenario 7 achieves the highest F-measure because by utilizing the length filter, we only compare records that are nearly within a pre-determined percentage of increase or decrease of string length. LSH is used to map the same inputs items to the buckets with a higher probability than dis-similar ones. It takes numHashTables as a parameter. Increasing the number of candidate pairs with the same numHashTables will reduce the accuracy of the model. Utilizing the length filter helps to minimize the number of candidates which in turn increases the accuracy of the approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.