Storing, analyzing and accessing data is a growing problem for organizations. Competitive pressures and new regulations are requiring organizations to efficiently handle increasing volumes and varieties of data, but this doesn't come cheap. And as the demands of Big Data exceed the constraints of traditional relational databases, evaluating legacy infrastructure and assessing new technology has become a necessity for most organizations, not only to gain competitive advantage, but also for compliance purposes. The challenge is how well the organization's legacy infrastructure integrates Big Data. It is without a doubt that one way or another Big Data must be accommodated by legacy systems. Legacy systems contain significant and invaluable business logic of the organization. Organizations cannot afford to throw away or replace this business logic. These legacy systems are assets of the organization. These invaluable assets of encoded "business logic" represent many years of coding, development, real-life experiences, enhancements, modifications, debugging etc. Most of the legacy systems were developed without the process models or data models-now needed to support and integrate Big Data. To integrate Big Data into legacy system, modernization of legacy system is required. There are many approaches for modernization of legacy systems but none of them are focused on integrating Big Data into legacy systems. Legacy systems hold valuable data too important to be lost in the process of modernization. However, addressing the issues and scope related to incorporating Big Data with legacy systems allows mature legacy systems to become part of groundswell changes. There are many areas unaddressed about integration of Big Data into legacy systems. Incorporating data from new sources, specifically "live" sources, into existing legacy systems is a technical challenge. Moreover, the sheer volume of Big Data can be daunting. Our paper presents the scope of integrating Big Data into modernization of legacy systems.
The data management process is characterised by a set of tasks where data quality management (DQM) is one of the core components. Data quality, however, is a multidimensional concept, where the nature of the data quality issues is very diverse. One of the most widely anticipated data quality challenges, which becomes particularly vital when data come from multiple data sources which is a typical situation in the current data-driven world, is duplicates or non-uniqueness. Even more, duplicates were recognised to be one of the key domain-specific data quality dimensions in the context of the Internet of Things (IoT) application domains, where smart grids and health dominate most. Duplicate data lead to inaccurate analyses, leading to wrong decisions, negatively affect data-driven and/or data processing activities such as the development of models, forecasts, simulations, have a negative impact on customer service, risk and crisis management, service personalisation in terms of both their accuracy and trustworthiness, decrease user adoption and satisfaction, etc. The process of determination and elimination of duplicates is known as deduplication, while the process of finding duplicates in one or more databases that refer to the same entities is known as Record Linkage. To find the duplicates, the data sets are compared with each other using similarity functions that are usually used to compare two input strings to find similarities between them, which requires quadratic time complexity. To defuse the quadratic complexity of the problem, especially in large data sources, record linkage methods, such as blocking and sorted neighbourhood, are used. In this paper, we propose a six-step record linkage deduplication framework. The operation of the framework is demonstrated on a simplified example of research data artifacts, such as publications, research projects and others of the real-world research institution representing Research Information Systems (RIS) domain. To make the proposed framework usable we integrated it into a tool that is already used in practice, by developing a prototype of an extension for the well-known DataCleaner. The framework detects and visualises duplicates thereby identifying and providing the user with identified redundancies in a user-friendly manner allowing their further elimination. By removing the redundancies, the quality of the data is improved therefore improving analyses and decision-making. This study makes a call for other researchers to take a step towards the “golden record” that can be achieved when all data quality issues are recognised and resolved, thus moving towards absolute data quality.
Educational institutions rely on academic citizenship behaviors to construct knowledge in a responsible manner. However, they often struggle to contain the unlawful reuse of knowledge (or academic citizenship transgressions) by some learning communities. This study draws upon secondary data from two televised episodes describing contract cheating (or ghostwriting) practices prevalent among international student communities. Against this background, we have investigated emergent teaching and learning structures that have been extended to formal and informal spaces with the use of mediating technologies. Learners’ interactions in formal spaces are influenced by ongoing informal social experiences within a shared cultural context to influence learners’ agency. Building upon existing theories, we have developed an analytical lens to understand the rationale behind cheating behaviors. Citizenship behaviors are based on individual and collective perceptions of what constitutes as acceptable or unacceptable behavior. That is, learners who are low in motivation and are less engaged with learning may collude; more so, if cheating is not condemned by members belonging to their informal social spaces. Our analytical lens describes institutional, cultural, technological, social and behavioral contexts that influence learner agency.
Software reuse has been regarded as one of the most important areas for improving software development productivity and the quality of software. Research and practice has shown that software reuse can be used for developing products from reusable assets in a routine manner, on an industrial scale. However successful application of software reuse is limited to certain domains and not widespread across the software industry. There is no clear consensus between input and, output artifacts and the requirements that an effective reuse process must have. To examine the issues and concerns in more detail we have completed surveys on software reuse in the conventional software engineering (CSE) community and in the software product line (SPL) community to compare and contrast the results for similarities and differences in software reuse philosophy. This paper outlines some of the identified differences and similarities of issues and concerns in software reuse in both communities and what one community can gain from the other to overcome the identified software reuse problems. The comparison highlights areas where the existing SPL and CSE communities provide extensive support in software reuse and those in which the SPL and CSE communities are deficient, suggesting an understanding of the factors resisting software reuse in both communities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.