Combining Distributed and Kernel Tracing for Performance Analysis of Cloud Applications

Gelle, Loïc; Ezzati‐Jivan, Naser; Dagenais, Michel R.

doi:10.3390/electronics10212610

Cited by 7 publications

(7 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The techniques mentioned above are specifically suitable for distributed systems because they emphasize context propagation and large-scale tracing. However, it is essential to note that these techniques are limited to capturing high-level events [7] and cannot extract system information related to performance degradation. Researchers [19,20], who work on low-level traces to detect or diagnose software performance degradation, encounter a large volume of events that they need to collect with high accuracy and low overhead.…”

Section: Low-level Software Tracingmentioning

confidence: 99%

“…Since low-level and high-level software tracing offer unique advantages, hybrid approaches have been proposed to make the best of both worlds. Gelle et al [7] suggest a solution that combines kernel tracing with distributed tracing, to better scrutinize events and determine the underlying cause of performance issues. By merging the advantages of both approaches, this method can obtain precise information about system interactions and high-level events, from distributed tracing, while collecting detailed and specific low-level events from kernel and user space tracing.…”

Section: Hybrid Software Tracingmentioning

confidence: 99%

“…It excels at high-throughput event collection, with a low impact on the target application. However, software tracing is not designed for end-to-end tracing in distributed environments [7].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

DTraComp: Comparing distributed execution traces for understanding intermittent latency sources

Ekhlasi,

Daneshgar,

Dagenais

et al. 2023

Preprint

View full text Add to dashboard Cite

Microservice architectures can enhance software development by using multiple programming languages and deployment infrastructures, isolating failures within individual services, and accelerating the debugging and fixing of issues in independent services. Locating performance degradation becomes challenging, due to the presence of numerous service instances with complex interactions compounded by parallelism. Although end-to-end tracing allows tracing execution paths across services, and detecting their latencies, it is limited to high-level information. Indeed, end-to-end tracing cannot pinpoint the root causes of performance degradation between the processes. Moreover, many existing performance analysis tools lack a comparison feature to give developers a comprehensive view of the performance differences between two groups of requests. This paper introduces DTraComp (Distributed Trace Compare), an open-source framework, compatible with various microservice trace standards, and integrated with Eclipse Trace Compass™. Our framework offers robust visual comparison capability for two groups of executions within distributed systems, which includes nested spans executed in parallel. Furthermore, it provides system kernel details for each thread involved in the execution of each span, allowing it to pinpoint the reasons for performance degradation across distributed systems. We used our proposed framework to analyze five practical use cases. By evaluating the efficiency of our tool, it was determined that the overall time complexity scales linearly O(n) with the trace size, indicating its suitability for deployment in production environments. It is currently used within Ericsson company for performance evaluation purposes.

show abstract

Section: Low-level Software Tracingmentioning

confidence: 99%

Section: Hybrid Software Tracingmentioning

confidence: 99%

See 1 more Smart Citation

DTraComp: Comparing distributed execution traces for understanding intermittent latency sources

Ekhlasi,

Daneshgar,

Dagenais

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…For distributed tracing, one of the most basic forms is to create a methodology to compare the traces visually [34]. Gelle et al [37] included kernel level events in the high-level logs to the tracing to improve the detection of anomalies and relate them to their root cause. Bento et al [38] went from visualizing tracing data for human inspection to automating the analysis of tracing data.…”

Section: Root Cause Analysismentioning

confidence: 99%

Provenance-enhanced Root Cause Analysis for Jupyter Notebooks

Xin

Stallinga

Liu

et al. 2022

2022 IEEE/ACM 15th International Conference on Utility and Cloud Computing (UCC)

View full text Add to dashboard Cite

With Jupyter notebooks becoming more commonly used within scientific research, more Jupyter notebook-based use cases have evolved to be distributed. This trend makes it more challenging to analyze anomalies and debug notebooks. Provenance data is an ideal option that can create more context around anomalies and make it easier to find the root cause of the anomaly. However, provenance rarely gets investigated in the context of distributed Jupyter notebooks. In this paper, we propose a framework that integrates two data types, provenance and detected performance anomalies based on performance data. We use the combined information to visually show the enduser the provenance at the time of the anomaly and the root cause of the anomaly. We build and evaluate the framework with a notebook extended with anomaly-generating functions. The generated anomalies were automatically detected, and the combined information of provenance and anomaly creates a valuable subset of the provenance data around the time an anomaly occurred. Our experiments create a clear and confined context for the anomaly and enable the framework to find the root cause of performance anomalies in Jupyter notebooks.

show abstract

“…Additionally, arranging the components in a way that maintains system attributes such as availability and low latency introduces intricacies [3,4]. The challenges of debugging [5][6][7][8][9] and component arrangement [10] have been addressed through the implementation of distributed tracing techniques [11][12][13][14][15][16][17][18]. However, applying these methods necessitates instrumenting the application's source code, which brings additional overhead and the risk of altering the application's behavior.…”

Section: Introductionmentioning

confidence: 99%

Vnode: Low-Overhead Transparent Tracing of Node.js-Based Microservice Architectures

Kabamba,

Khouzam,

Dagenais

2023

Future Internet

Self Cite

View full text Add to dashboard Cite

Tracing serves as a key method for evaluating the performance of microservices-based architectures, which are renowned for their scalability, resource efficiency, and high availability. Despite their advantages, these architectures often pose unique debugging challenges that necessitate trade-offs, including the burden of instrumentation overhead. With Node.js emerging as a leading development environment recognized for its rapidly growing ecosystem, there is a pressing need for innovative performance debugging approaches that reduce the telemetry data collection efforts and the overhead incurred by the environment’s instrumentation. In response, we introduce a new approach designed for transparent tracing and performance debugging of microservices in cloud settings. This approach is centered around our newly developed Internal Transparent Tracing and Context Reconstruction (ITTCR) technique. ITTCR is adept at correlating internal metrics from various distributed trace files to reconstruct the intricate execution contexts of microservices operating in a Node.js environment. Our method achieves transparency by directly instrumenting the Node.js virtual machine, enabling the collection and analysis of trace events in a transparent manner. This process facilitates the creation of visualization tools, enhancing the understanding and analysis of microservice performance in cloud environments. Compared to other methods, our approach incurs an overhead of approximately 5% on the system for the trace collection infrastructure while exhibiting minimal utilization of system resources during analysis execution. Experiments demonstrate that our technique scales well with very large trace files containing huge numbers of events and performs analyses in very acceptable timeframes.

show abstract

Combining Distributed and Kernel Tracing for Performance Analysis of Cloud Applications

Cited by 7 publications

References 37 publications

DTraComp: Comparing distributed execution traces for understanding intermittent latency sources

DTraComp: Comparing distributed execution traces for understanding intermittent latency sources

Provenance-enhanced Root Cause Analysis for Jupyter Notebooks

Vnode: Low-Overhead Transparent Tracing of Node.js-Based Microservice Architectures

Contact Info

Product

Resources

About