Recent research in scalable model-driven engineering now allows very large models to be stored and queried. Due to their size, rather than transferring such models over the network in their entirety, it is typically more efficient to access them remotely using networked services (e.g. model repositories, model indexes). Little attention has been paid so far to the nature of these services, and whether they remain responsive with an increasing number of concurrent clients. This paper extends a previous empirical study on the impact of certain key decisions on the scalability of concurrent model queries on two domains, using an Eclipse Connected Data Objects model repository, four configurations of the Hawk model index and a Neo4j-based configuration of the NeoEMF model store. The study evaluates the impact of the network protocol, the API design, the caching layer, the query language and the type of database and analyses the reasons for their varying levels of performance. The design of the API was shown to make a bigger difference compared to the network protocol (HTTP/TCP) used. Where available, the query-specific indexed and derived attributes in Hawk outperformed the comprehensive generic caching in CDO. Finally, the results illustrate the still ongoing evolution of graph databases: two tools using different versions of the same backend had very different performance, with one slower than CDO and the other faster than it.
Scalability in Model-Driven Engineering (MDE) is often a bottleneck for industrial applications. Industrial scale models need to be persisted in a way that allows for their seamless and efficient manipulation, often by multiple stakeholders simultaneously. This paper compares the conventional and commonly used persistence mechanisms in MDE with novel approaches such as the use of graph-based NoSQL databases; Prototype integrations of Neo4J and OrientDB with EMF are used to compare with relational database, XMI and document-based NoSQL database persistence mechanisms. It also compares and benchmarks two approaches for querying models persisted in graph databases to measure and compare their relative performance in terms of memory usage and execution time.
XML Metadata Interchange (XMI) is an OMG-standardised model exchange format, which is natively supported by the Eclipse Modeling Framework (EMF) and the majority of the modelling and model management languages and tools. Whilst XMI is widely supported, the XMI parser provided by EMF is inefficient in some cases where models are readonly (such as input models for model query, model-to-model transformation, etc) as it always requires loading the entire model into memory. In this paper we present a novel algorithm, and a prototype implementation (SmartSAX), which is capable of partially loading models persisted in XMI. SmartSAX offers improved performance, in terms of loading time and memory footprint, over the default EMF XMI parser. We describe the algorithm in detail, and present benchmarking results that demonstrate the substantial improvements of the prototype implementation over the XMI parser provided by EMF.
CCS Concepts•Software and its engineering → Software development methods;
Large-scale software repository mining typically requires substantial storage and computational resources, and often involves a large number of calls to (rate-limited) APIs such as those of GitHub and StackOverflow. This creates a growing need for distributed execution of repository mining programs to which remote collaborators can contribute computational and storage resources, as well as API quotas (ideally without sharing API access tokens or credentials). In this paper we introduce CROSSFLOW, a novel framework for building distributed repository mining programs. We demonstrate how CROSSFLOW can delegate mining jobs to remote workers and cache their results, and how workers can implement advanced behaviour such as load balancing and rejecting jobs they cannot perform (e.g. due to lack of space, credentials for a specific API).
With the increase in the complexity of software systems, the size and the complexity of underlying models also increases proportionally. In a low-code system, models can be stored in different backend technologies and can be represented in various formats. Tailored high-level query languages are used to query such heterogeneous models, but typically this has a significant impact on performance. Our main aim is to propose optimization strategies that can help to query large models in various formats efficiently. In this paper, we present an approach based on compile-time static analysis and specific query optimizers/translators to improve the performance of complex queries over large-scale heterogeneous models. The proposed approach aims to bring efficiency in terms of query execution time and memory footprint, when compared to the naive query execution for low-code platforms. CCS CONCEPTS • Software and its engineering → Model-driven software engineering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.