Lucas Mello Schnorr scite author profile

Researchers in the area of grid/cloud computing perform many of their experiments using simulations that must capture network behavior. In this context, packet-level simulations, which are widely used to study network protocols, are too costly given the typical large scales of simulated systems and applications. An alternative is to implement network simulations with less costly flow-level models. Several flow-level models have been proposed and implemented in grid/cloud simulators. Surprisingly, published validations of these models, if any, consist of verifications for only a few simple cases. Consequently, even when they have been used to obtain published results, the ability of these simulators to produce scientifically meaningful results is in doubt. This work evaluates these state-of-the-art flow-level network models of TCP communication via comparison to packet-level simulation. While it is straightforward to show cases in which previously proposed models lead to good results, instead we follow the critical method, which places model refutation at the center of the scientific activity, and we systematically seek cases that lead to invalid results. Careful analysis of these cases reveal fundamental flaws and also suggest improvements. One contribution of this work is that these improvements lead to a new model that, while far from being perfect, improves upon all previously proposed models in the context of simulation of grids or clouds. A more important contribution, perhaps, is provided by the pitfalls and unexpected behaviors encountered in this work, leading to a number of enlightening lessons. In particular, this work shows that model validation cannot be achieved solely by exhibiting (possibly many) "good cases." Confidence in the quality of a model can only be strengthened through an invalidation approach that attempts to prove the model wrong.

show abstract

Evaluating trace aggregation for performance visualization of large distributed systems

Lamarche-Perrin

Schnorr

Vincent

et al. 2014

View full text Add to dashboard Cite

Performance analysis through visualization techniques usually suffers semantic limitations due to the size of parallel applications. Most performance visualization tools rely on data aggregation to work at scale, without any attempt to evaluate the loss of information caused by such aggregations. This paper proposes a technique to evaluate the quality of aggregated representations -using measures from information theoryand to optimize such measures in order to build consistent multiresolution representations of large execution traces.

show abstract

TWINS: Server Access Coordination in the I/O Forwarding Layer

Bez

Boito

Schnorr

et al. 2017

View full text Add to dashboard Cite

This paper presents a study of I/O scheduling techniques applied to the I/O forwarding layer. In high-performance computing environments, applications rely on parallel file systems (PFS) to obtain good I/O performance even when handling large amounts of data. To alleviate the concurrency caused by thousands of nodes accessing a significantly smaller number of PFS servers, intermediate I/O nodes are typically applied between processing nodes and the file system. Each intermediate node forwards requests from multiple clients to the system, a setup which gives this component the opportunity to perform optimizations like I/O scheduling. We evaluate scheduling techniques that improve spatiality and request size of the access patterns. We show they are only partially effective because the access pattern is not the main factor for read performance in the I/O forwarding layer. A new scheduling algorithm, TWINS, is presented to coordinate the access of intermediate I/O nodes to the data servers. Our proposal decreases concurrency at the data servers, a factor previously proven to negatively affect performance. The proposed algorithm is able to improve read performance from shared files by up to 28% over other scheduling algorithms and by up to 50% over not forwarding I/O.

show abstract

A hierarchical aggregation model to achieve visualization scalability in the analysis of parallel applications

2012

View full text Add to dashboard Cite

Detection and analysis of resource usage anomalies in large distributed systems through multi‐scale visualization

Schnorr

Legrand

Vincent

2011

Concurrency and Computation

View full text Add to dashboard Cite

International audienceUnderstanding the behavior of large scale distributed systems is generally extremely difficult as it requires to observe a very large number of components over very large time.Most analysis tools for distributed systems gather basic information such as individual processor or network utilization. Although scalable because of the data reduction techniques applied before the analysis, these tools are often insufficient to detect or fully understand anomalies in the dynamic behavior of resource utilization and their influence on the applications performance.In this paper, we propose a methodology for detecting resource usage anomalies in large scale distributed systems. The methodology relies on four functionalities: characterized trace collection, multi-scale data aggregation, specifically tailored user interaction techniques, and visualization techniques. We show the efficiency of this approach through the analysis of simulations of the volunteer computing Berkeley Open Infrastructure for Network Computing architecture. Three scenarios are analyzed in this paper: analysis of the resource sharing mechanism, resource usage considering response time instead of throughput, and the evaluation of input file size on Berkeley Open Infrastructure for Network Computing architecture. The results show that our methodology enables to easily identify resource usage anomalies, such as unfair resource sharing, contention, moving network bottlenecks, and harmful short-term resource sharing

show abstract

Triva: Interactive 3D visualization for performance analysis of parallel applications

Schnorr

Huard

Navaux

2010

Future Generation Computer Systems

View full text Add to dashboard Cite

Visualizing More Performance Data Than What Fits on Your Screen

Schnorr

Legrand

2013

View full text Add to dashboard Cite

International audienceHigh performance applications are composed of many processes that are executed in large-scale systems with possibly millions of computing units. A possible way to conduct a performance analysis of such applications is to register in trace files the behavior of all processes belonging to the same application. The large number of processes and the very detailed behavior that we can record about them lead to a trace size explosion both in space and time dimensions. The performance visualization of such data is very challenging because of the quantities involved and the limited screen space available to draw them all. If the amount of data is not properly treated for visualization, the analysis may give the wrong idea about the behavior registered in the traces. This paper is twofold: first, it details data aggregation techniques that are fully configurable by the user to control the level of details in both space and time dimensions; second, it presents two visualization techniques that take advantage of the aggregated data to scale. These features are part of the Viva open-source tool and framework, which is also briefly described in this paper

show abstract

Towards Visualization Scalability through Time Intervals and Hierarchical Organization of Monitoring Data

Schnorr

Huard

Navaux

2009

View full text Add to dashboard Cite

International audienceHighly distributed systems such as grids are used today to the execution of large-scale parallel applications. The behavior analysis of these applications is not trivial. The complexity appears because of the event correlation among processes, external influences like time-sharing mechanisms and saturation of network links, and also the amount of data that registers the application behavior. Almost all visualization tools to analysis of parallel applications offer a space-time representation of the application behavior. This paper presents a novel technique that combines traces from grid applications with a treemap visualization of the data. With this combination, we dynamically create an annotated hierarchical structure that represents the application behavior for the selected time interval. The experiments in the grid show that we can readily use our technique to the analysis of large-scale parallel applications with thousands of processes

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.