Data seldom create value by themselves. They need to be linked and combined from multiple sources, which can often come with variable data quality. The task of improving data quality is a recurring challenge. In this paper, we use a case study of a large telecom company to develop a generic process pattern model for improving data quality. The process pattern model is defined as a proven series of activities, aimed at improving the data quality given a certain context, a particular objective, and a specific set of initial conditions. Four different patterns are derived to deal with the variations in data quality of datasets. Instead of having to find the way to improve the quality of big data for each situation, the process model provides data users with generic patterns, which can be used as a reference model to improve big data quality.
Abstract-Big data has been acknowledged for its enormous potential. In contrast to the potential, in a recent survey more than half of financial service organizations reported that big data has not delivered the expected value. One of the main reasons for this is related to data quality. The objective of this research is to identify the antecedents of big data quality in financial institutions. This will help to understand how data quality from big data analysis can be improved. For this, a literature review was performed and data was collected using three case studies, followed by content analysis. The overall findings indicate that there are no fundamentally new data quality issues in big data projects. Nevertheless, the complexity of the issues is higher, which makes it harder to assess and attain data quality in big data projects compared to the traditional projects. Ten antecedents of big data quality were identified encompassing data, technology, people, process and procedure, organization, and external aspects.
Organizations are looking for ways to gain advantage of big and open linked data (BOLD) by employing statistics, however, how these benefits can be created is often unclear. A reference architecture (RA) can capitalize experiences and facilitate the gaining of the benefits, but might encounter challenges when trying to gain the benefits of BOLD. The objective of the research to evaluate the benefits and challenges of building IT systems using a RA. We do this by investigating cases of the utilization of a RA for Linked Open Statistical Data (LOSD). Benefits of using the reference architecture include reducing project complexity, avoiding having to "reinvent the wheel", easing the analysis of a (complex) system, preserving knowledge (e.g. proven concepts and practices), mitigating multiple risks by reusing proven building blocks, and providing users a common understanding. Challenges encountered include the need for communication and learning the ins and outs of the RA, missing features, inflexibility to add new instances as well as integrating the RA with existing implementations, and the need for support for the RA from other stakeholders.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.