In the last decades, an increasing number of employers and job seekers have been relying on Web resources to get in touch and to find a job. If appropriately retrieved and analyzed, the huge number of job vacancies available today on on-line job portals can provide detailed and valuable information about the Web Labor Market dynamics and trends. In particular, this information can be useful to all actors, public and private, who play a role in the European Labor Market. This paper presents WoLMIS, a system aimed at collecting and automatically classifying multilingual Web job vacancies with respect to a standard taxonomy of occupations. The proposed system has been developed for the Cedefop European agency, which supports the development of European Vocational Education and Training (VET) policies and contributes to their implementation. In particular, WoLMIS allows analysts and Labor Market specialists to make sense of Labor Market dynamics and trends of several countries in Europe, by overcoming linguistic boundaries across national borders. A detailed experimental evaluation analysis is also provided for a set of about 2 million job vacancies, collected from a set of UK and Irish Web job sites from June to September 2015.
Today the Web represents a rich source of labour market data for both public and private operators, as a growing number of job offers are advertised through Web portals and services. In this paper we apply and compare several techniques, namely explicit-rules, machine learning, and LDA-based algorithms to classify a real dataset of Web job offers collected from 12 heterogeneous sources against a standard classification system of occupations
The rapid growth of Web usage for advertising job positions provides a great opportunity for real-time labour market monitoring. This is the aim of Labour Market Intelligence (LMI), a field that is becoming increasingly relevant to EU Labour Market policies design and evaluation. The analysis of Web job vacancies, indeed, represents a competitive advantage to labour market stakeholders with respect to classical survey-based analyses, as it allows for reducing the time-to-market of the analysis by moving towards a fact-based decision making model. In this paper, we present our approach for automatically classifying million Web job vacancies on a standard taxonomy of occupations. We show how this problem has been expressed in terms of text classification via machine learning. Then, we provide details about the classification pipelines we evaluated and implemented, along with the outcomes of the validation activities. Finally, we discuss how machine learning contributed to the LMI needs of the European Organisation that supported the project.
Data Cleaning represents a crucial and error prone activity in KDD that might have unpredictable effects on data analytics, affecting the believability of the whole KDD process. In this paper we describe how a bridge between AI Planning and Data Quality communities has been made, by expressing both the data quality and cleaning tasks in terms of AI planning. We also report a real-life application of our approach.
The data extracted from electronic archives is a valuable asset; however, the issue of the (poor) data quality should be addressed before performing data analysis and decision-making activities. Poor data quality is frequently cleansed using business rules derived from domain knowledge. Unfortunately, the process of designing and implementing cleansing activities based on business rules requires a relevant effort. In this article, we illustrate a model-based approach useful to perform inconsistency identification and corrective interventions, thus simplifying the process of developing cleansing activities. The article shows how the cleansing activities required to perform a sensitivity analysis can be easily developed using the proposed model-based approach. The sensitivity analysis provides insights on how the cleansing activities can affect the results of indicators computation. The approach has been successfully used on a database describing the working histories of an Italian area population. A model formalizing how data should evolve over time (i.e., a data consistency model) in such domain was created (by means of formal methods) and used to perform the cleansing and sensitivity analysis activities.
Abstract:Research on data quality is growing in importance in both industrial and academic communities, as it aims at deriving knowledge (and then value) from data. Information Systems generate a lot of data useful for studying the dynamics of subjects' behaviours or phenomena over time, making the quality of data a crucial aspect for guaranteeing the believability of the overall knowledge discovery process. In such a scenario, data cleansing techniques, i.e., automatic methods to cleanse a dirty dataset, are paramount. However, when multiple cleansing alternatives are available a policy is required for choosing between them. The policy design task still relies on the experience of domain-experts, and this makes the automatic identification of accurate policies a significant issue. This paper extends the Universal Cleaning Process enabling the automatic generation of an accurate cleansing policy derived from the dataset to be analysed. The proposed approach has been implemented and tested on an on-line benchmark dataset, a real-world instance of the Labour Market Domain. Our preliminary results show that our approach would represent a contribution towards the generation of data-driven policy, reducing significantly the domain-experts intervention for policy specification. Finally, the generated results have been made publicly available for downloading.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.