Abstract-The widely used Message Passing Interface (MPI) is complex and rich. As a result, application developers require automated tools to avoid and to detect MPI programming errors. We present the Marmot Umpire Scalable Tool (MUST) that detects such errors with a significantly increased scalability. We present improvements to our graph-based deadlock detection approach for MPI, which cover complex MPI constructs, as well as future MPI extensions. Further, our enhancements check complex MPI constructs that no previous graph-based detection approach handled correctly. Finally, we present optimizations for the processing of MPI operations that reduce runtime deadlock detection overheads. Existing approaches could require O(p) analysis time per MPI operation, for p processes, where our improvements lead to an O(log p) complexity or better for real world applications. We present overhead measurements for two major benchmark suites with up to 1024 cores to demonstrate our improvements for real world scenarios.