The goal for CMS computing is to maximise the throughput of simulated event generation while also processing the real data events as quickly and reliably as possible. To maintain this achievement as the quantity of events increases, since the beginning of 2011 CMS computing has migrated at the Tier 1 level from its old production framework, ProdAgent, to a new one, WMAgent. The WMAgent framework offers improved processing efficiency and increased resource usage as well as a reduction in manpower. In addition to the challenges encountered during the design of the WMAgent framework, several operational issues have arisen during its commissioning. The largest operational challenges were in the usage and monitoring of resources, mainly a result of a change in the way work is allocated. Instead of work being assigned to operators, all work is centrally injected and managed in the Request Manager system and the task of the operators has changed from running individual workflows to monitoring the global workload. In this report we present how we tackled some of the operational challenges, and how we benefitted from the lessons learned in the commissioning of the WMAgent framework at the Tier 2 level in late 2011. As case studies, we will show how the WMAgent system performed during some of the large data reprocessing and Monte Carlo simulation campaigns.
CMS has started the process of rolling out a new workload management system. This system is currently used for reprocessing and monte carlo production with tests under way using it for user analysis. It was decided to combine, as much as possible, the production/processing, analysis and T0 codebases so as to reduce duplicated functionality and make best use of limited developer and testing resources. This system now includes central request submission and management (Request Manager); a task queue for parcelling up and distributing work (WorkQueue) and agents which process requests by interfacing with disparate batch and storage resources (WMAgent).
This paper describes the development of a digital-based Beam Position System which was designed, developed, and adapted for the Tevatron during Collider Run II.
Scientific analysis and simulation requires the processing and generation of millions of data samples. These processing and generation tasks are often comprised of multiple smaller tasks divided over multiple (computing) sites. This paper discusses the Compact Muon Solenoid (CMS) workflow infrastructure, and specifically the Python based workflow library which is used for so called task lifecycle management. The CMS workflow infrastructure consists of three layers: high level specification of the various tasks based on input/output datasets, life cycle management of task instances derived from the high level specification and execution management. The workflow library is the result of a convergence of three CMS subprojects that respectively deal with scientific analysis, simulation and real time data aggregation from the experiment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.