Bart Theeten scite author profile

Bart Theeten

5Publications

39Citation Statements Received

46Citation Statements Given

How they've been cited

How they cite others

Affiliations

Nokia (Belgium), Nokia (Finland)

Publications

Order By: Most citations

Import2vec: Learning Embeddings for Software Libraries

Theeten

Vandeputte

Cutsem

2019

View full text Add to dashboard Cite

We consider the problem of developing suitable learning representations (embeddings) for library packages that capture semantic similarity among libraries. Such representations are known to improve the performance of downstream learning tasks (e.g. classification) or applications such as contextual search and analogical reasoning.We apply word embedding techniques from natural language processing (NLP) to train embeddings for library packages ("library vectors"). Library vectors represent libraries by similar context of use as determined by import statements present in source code. Experimental results obtained from training such embeddings on three large open source software corpora reveals that library vectors capture semantically meaningful relationships among software libraries, such as the relationship between frameworks and their plug-ins and libraries commonly used together within ecosystems such as big data infrastructure projects (in Java), front-end and back-end web development frameworks (in JavaScript) and data science toolkits (in Python).

show abstract

Modeling performance of a parallel streaming engine

Bedini¹,

Sakr²,

Theeten³

et al. 2013

View full text Add to dashboard Cite

While data are growing at a speed never seen before, parallel computing is becoming more and more essential to process this massive volume of data in a timely manner. Therefore, recently, concurrent computations have been receiving increasing attention due to the widespread adoption of multi-core processors and the emerging advancements of cloud computing technology. The ubiquity of mobile devices, location services, and sensor pervasiveness are examples of new scenarios that have created the crucial need for building scalable computing platforms and parallel architectures to process vast amounts of generated streaming data. In practice, efficiently operating these systems is hard due to the intrinsic complexity of these architectures and the lack of a formal and in-depth knowledge of the performance models and the consequent system costs. The Actor Model theory has been presented as a mathematical model of concurrent computation that had enormous success in practice and inspired a number of contemporary work in this area. Recently, the Storm system has been presented as a realization of the principles of the Actor Model theory in the context of the large scale processing of streaming data. In this paper, we present, to the best of our knowledge, the first set of models that formalize the performance characteristics of a practical distributed, parallel and fault-tolerant stream processing system that follows the Actor Model theory. In particular, we model the characteristics of the data flow, the data processing and the system management costs at a fine granularity within the different steps of executing a distributed stream processing job. Finally, we present an experimental validation of the described performance models using the Storm system.

show abstract

Towards the optimization of a parallel streaming engine for telco applications

Theeten¹,

Bedini²,

Cogan³

et al. 2014

Bell Labs Tech. J.

View full text Add to dashboard Cite

Parallel and distributed computing is becoming essential to process in real time the increasingly massive volume of data collected by telecommunications companies. Existing computational paradigms such as MapReduce (and its popular open-source implementation Hadoop) provide a scalable, fault tolerant mechanism for large scale batch computations. However, many applications in the telco ecosystem require a real time, incremental streaming approach to process data in real time and enable proactive care. Storm is a scalable, fault tolerant framework for the analysis of real time streaming data. In this paper we provide a motivation for the use of real time streaming analytics in the telco ecosystem. We perform an experimental investigation into the performance of Storm, focusing in particular on the impact of parameter configuration. This investigation reveals that optimal parameter choice is highly non-trivial and we use this as motivation to create a parameter configuration engine. As first steps towards the creation of this engine we provide a deep analysis of the inner workings of Storm and provide a set of models describing data flow cost, central processing unit (CPU) cost, and system management cost. ©2014 Alcatel-Lucent

show abstract

CHive: Bandwidth Optimized Continuous Querying in Distributed Clouds

Theeten¹,

Janssens²

2015

IEEE Trans. Cloud Comput.

View full text Add to dashboard Cite

Ontology-Based Discovery of Data-Driven Services

Bynens

Win

Joosen

et al. 2006

View full text Add to dashboard Cite

Abstract-Current service technologies are primarily focused on the functionality of services. A significant portion of the available services, however, exhibits a data-driven rather than a functionality-driven character, which makes the current technologies less appropriate. This paper focuses on discovery for data-driven services as part of data federation as an overall goal. The primary requirements and characteristics are discussed and a prototype implementation based on ebXML is presented. Although significant progress has been made, many practical issues remain to be addressed in order to get this model fully operational.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Bart Theeten

Import2vec: Learning Embeddings for Software Libraries

Modeling performance of a parallel streaming engine

Towards the optimization of a parallel streaming engine for telco applications

CHive: Bandwidth Optimized Continuous Querying in Distributed Clouds

Ontology-Based Discovery of Data-Driven Services

Contact Info

Product

Resources

About