Data quality remains a persistent problem in practice and a challenge for research. In this study we focus on the four dimensions of data quality noted as the most important to information consumers, namely accuracy, completeness, consistency, and timeliness. These dimensions are of particular concern for operational systems, and most importantly for data warehouses, which are often used as the primary data source for analyses such as classification, a general type of data mining. However, the definitions and conceptual models of these dimensions have not been collectively considered with respect to data mining in general or classification in particular. Nor have they been considered for problem complexity. Conversely, these four dimensions of data quality have only been indirectly addressed by data mining research. Using definitions and constructs of data quality dimensions, our research evaluates the effects of both data quality and problem complexity on generated data and tests the results in a real-world case. Six different classification outcomes selected from the spectrum of classification algorithms show that data quality and problem complexity have significant main and interaction effects. From the findings of significant effects, the economics of higher data quality are evaluated for a frequent application of classification and illustrated by the real-world case.
Research in data and information quality has made significant strides over the last 20 years. It has become a unified body of knowledge incorporating techniques, methods, and applications from a variety of disciplines including information systems, computer science, operations management, organizational behavior, psychology, and statistics. With organizations viewing “Big Data”, social media data, data-driven decision-making, and analytics as critical, data quality has never been more important. We believe that data quality research is reaching the threshold of significant growth and a metamorphosis from focusing on measuring and assessing data quality—content—toward a focus on usage and context. At this stage, it is vital to understand the identity of this research area in order to recognize its current state and to effectively identify an increasing number of research opportunities within. Using Latent Semantic Analysis (LSA) to analyze the abstracts of 972 peer-reviewed journal and conference articles published over the past 20 years, this article contributes by identifying the core topics and themes that define the identity of data quality research. It further explores their trends over time, pointing to the data quality dimensions that have—and have not—been well-studied, and offering insights into topics that may provide significant opportunities in this area.
This research investigates how competition intensity and differences in cost structures affect decisions made by competing suppliers and the role that behavioral factors play as influences. We use controlled laboratory experiments to study the scenario of suppliers competing for a share of demand being outsourced by a single buyer. The buyer seeks to maximize the service level provided by suppliers by allocating based on different performance measures which create varying levels of competition intensity. The experimental treatments include those performance measures as well as differences in supplier cost structures. Our experimental results show that in the majority of cases suppliers' decisions do not confirm theoretical predictions from the Nash equilibrium, and we find patterns in those deviations. To explain them, we first evaluate behavioral factors found in the literature including bounded rationality, learning, and other-regarding behavior. We then introduce a new behavioral factor, rival-chasing. Rival-chasing builds on other-regarding behavior by considering competitors' actions in addition to their outcomes. We find that rival-chasing can explain patterns in suppliers' behavior that cannot be explained by other behavioral factors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.