Semi-supervised classifiers combine labeled and unlabeled data during the learning phase in order to increase classifier's generalization capability. However, most successful semi-supervised classifiers involve complex ensemble structures and iterative algorithms which make it difficult to explain the outcome, thus behaving like black boxes. Furthermore, during an iterative self-labeling process, mistakes can be propagated if no amending procedure is used. In this paper, we build upon an interpretable self-labeling grey-box classifier that uses a black box to estimate the missing class labels and a white box to make the final predictions. We propose a Rough Set based approach for amending the self-labeling process. We compare its performance to the vanilla version of our self-labeling grey-box and the use of a confidence-based amending. In addition, we introduce some measures to quantify the interpretability of our model. The experimental results suggest that the proposed amending improves accuracy and interpretability of the self-labeling greybox, thus leading to superior results when compared to state-ofthe-art semi-supervised classifiers.
Information quality and organizational transparency are relevant issues for corporate governance and sustainability of companies, as they contribute to reducing information asymmetry, decreasing risks, and improving the conduct of decision-makers, ensuring an ethical standard of organizational control. This work uses the COBIT framework of IT governance, knowledge management, and machine learning techniques to evaluate organizational transparency considering the maturity levels of technology processes applied in 285 companies of southern Brazil. Data mining techniques have been methodologically applied to analyze the 37 processes in four different domains: Planning and organization, acquisition and implementation, delivery and support, and monitoring. Four learning techniques for knowledge discovery have been used to build a computational model that allowed us to evaluate the organizational transparency level. The results evidence the importance of IT performance monitoring and assessment, and internal control processes in enabling organizations to improve their levels of transparency. These processes depend directly on the establishment of IT strategic plans and quality management, as well as IT risk and project management, therefore an improvement in the maturity of these processes implies an increase in the levels of organizational transparency and their reputational, financial, and accountability impact.
Para el sector eléctrico se desarrolla un Sistema de Información Geográfica denominado SIGOBE versión 3.0. Las bases de datos que tributan información alfanumérica son el Sistema Integral de Gestión de la ECIE (SIGECIE) y el Sistema Integral de Gestión de Redes (SIGERE). Estudios realizados determinan la necesidad de un modelo para el manejo de datos, que contribuya al desarrollo del SIG, sobre un esquema conceptual del dominio capaz de dar respuestas a las diferentes peticiones del usuario, por medio de consultas automáticas, como soporte a la toma de decisiones. Para dotar al SIG de una base conceptual se desarrolla una ontología, expresada mediante lógicas descriptivas, para generar los rasgos de un Razonamiento Basado en Casos que permite la automatización de las consultas. La calidad final del SIG se verifica de acuerdo a los estándares de calidad de la norma ISO-9126:2002. El modelo propuesto y sus funcionalidades contribuye a: facilitar la toma de decisiones a diferentes niveles, realizar análisis de riesgos al tener los defectos de las instalaciones eléctricas, disminuir el tiempo de avería a las áreas claves del país, organizar el recorrido de los carros más eficientemente y localizar las fallas eléctricas con mayor precisión.
In the context of some machine learning applications, obtaining data instances is a relatively easy process but labeling them could become quite expensive or tedious. Such scenarios lead to datasets with few labeled instances and a larger number of unlabeled ones. Semi-supervised classification techniques combine labeled and unlabeled data during the learning phase in order to increase classifier's generalization capability. Regrettably, most successful semi-supervised classifiers do not allow explaining their outcome, thus behaving like black boxes. However, there is an increasing number of problem domains in which experts demand a clear understanding of the decision process. In this paper, we report on an extended experimental study presenting an interpretable self-labeling grey-box classifier that uses a black box to estimate the missing class labels and a white box to make the final predictions. Two different approaches for amending the self-labeling process are explored: a first one based on the confidence of the black box and the latter one based on measures from Rough Set Theory. The results of the extended experimental study support the interpretability by means of transparency and simplicity of
Opinion mining and summarization of the increasing user-generated content on different digital platforms (e.g., news platforms) are playing significant roles in the success of government programs and initiatives in digital governance, from extracting and analyzing citizen’s sentiments for decision-making. Opinion mining provides the sentiment from contents, whereas summarization aims to condense the most relevant information. However, most of the reported opinion summarization methods are conceived to obtain generic summaries, and the context that originates the opinions (e.g., the news) has not usually been considered. In this paper, we present a context-aware opinion summarization model for monitoring the generated opinions from news. In this approach, the topic modeling and the news content are combined to determine the “importance” of opinionated sentences. The effectiveness of different developed settings of our model was evaluated through several experiments carried out over Spanish news and opinions collected from a real news platform. The obtained results show that our model can generate opinion summaries focused on essential aspects of the news, as well as cover the main topics in the opinionated texts well. The integration of term clustering, word embeddings, and the similarity-based sentence-to-news scoring turned out the more promising and effective setting of our model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.