Containers are increasingly used for software deployment, because of the modularity they offer for packagingand isolating applications. However, this implies a reliable communication system between computing elementsin different containers. Thence, conventional messaging systems have evolved and adapted to increasing loadsand Edge Computing. Inter-container communications, with Message Oriented Middleware, provide insight intothe execution of distributed applications, as well as a deeper input for analysis. However detecting messagelosses and slowdowns on this type of infrastructure is a challenge. Actual tracing solutions for this task werecompared to identify shortcomings and possible improvements. New tracing methods are proposed to addressthese shortcomings and open the door to more versatile tracing tools. This paper focuses on the approachestaken to extract information from messages, and achieve advanced analysis. Two new methods are presented,each providing a detailed picture of the distributed system, while being better suited for different use cases,depending on the environmental constraints.
Due to the ever‐increasing number of computer nodes in distributed systems, efficient and effective tools have become crucial for their analysis. Although several efficient methods have been proposed to monitor and profile distributed systems, tracing remains the most effective solution for in‐depth system analysis. Tracing is the act of collecting a trace, which is a sequence of low‐level events generated by the kernel or the userspace. After data collection, the most important part is the event analysis. The paradigm and choice of graphs determine the ability of the user to detect abnormal behaviors and identify their root cause. Although tracing is a highly effective approach to analyzing complex systems, the scalability of the current analysis tools is limited. As a consequence, tracing is often impractical for large distributed systems. This paper identifies the shortcomings of the current approaches, most notably the critical path computation and the trace file transfer between nodes. Then, this paper proposes new solutions to these drawbacks, most notably a distributed algorithm to compute the critical path, that does not aggregate all traces in a single node, and an efficient architecture to perform tracing on distributed systems. These new solutions are made publically available.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.