seismo!cmcl2!sbcs!lw lw@suny-sbcs@CSNet-Relay There is a rising demand for facilities to monitor and debug large d i s t r i b u t e d programs.Although many debuggers exist for serial applications programs (e.g., dbx), they are inadequate for distributed programs.Although user interfaces to monitor distributed programs have been reported~Mi186], they do not provide facilities to recreate timing dependent errors. D i s t r i b u t e d debugging via serial debuggers suffers from two problems:i. serial debuggers are made for one process running on a single processor, whereas distributed programs contain many processes running on different processors in a network. 2. the sequence of events that lead to an error in a distributed program can be quite complex to model and difficult to capture and recreate using a serial debugger. A true distributed debugger should handle both problems easily. Goals a n d~s s u m p t i o n sThe goals for the Bugnet system to debug distributed applications programs ~e r e set long ago[Cur82]. We recently have begun an implementation of Bugnet ~_ h a t should be completed this summer.Bug]net will monitor program execution from one location in a distributed system and give the user requested information regarding Inter-process communication (IPC), input/output (I/O), and execution traces for each process.More significantly, Bugnet will let the programmer detect an error, roll back to a time in the event sequence before it occurred, and replay events leading up to the error.During replay, the user will have the choice of replaying one process or many.Bugnet will provide previously saved messages from processes that are not being re-executed during replay. Furthermore, the user will be allowed to interact with the distributed p r o g r a m during execution or replay by changing values of local variables, b y generating selective snapshots of a group of processes, and b y collecting other execution statistics.Changes to programs during replay will allow testing corrections under whatever conditions previously caused an error. The user also will be able to suspend or a w a k e n a process, trace its execution, and execute it continuously or via single steps.Our implementation of Bugnet for Modula-2 systems on an ethernet of Sun workstations will take full vantage of the graphics facilities of those machines.Bugnet is based on the following three assumptions: I. all communication between processes is explicit; 2. all application progr~m~ perform input and output by communication with the s y s t e m -p r o v i d e d p r o c e s s e s ; 3. no special hardware support for debugging is required. Assumptions one and two limit the mechanisms for communication which the debugger mu~t handle.Number one precludes side affects such as assignments to shared variables.Processes must c o m m u n i c a t e b y m e s s a g e s a s in C l u~i s 7 9 ] , b y rendezvous as in Ada[Ada80] or by remote procedure c a l l s~e l 8 1 ] .The last assumption assures the portability of the debugger.However, without s~i a ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.