LHCb Online event processing and filtering

Abstract. For Run 2 of the LHC, LHCb is replacing a significant part of its event filter farm with new compute nodes. For the evaluation of the best performing solution, we have developed a method to convert our high level trigger application into a stand-alone, bootable benchmark image. With additional instrumentation we turned it into a self-optimising benchmark which explores techniques such as late forking, NUMA balancing and optimal number of threads, i.e. it automatically optimises box-level performance. We have run this procedure on a wide range of Haswell-E CPUs and numerous other architectures from both Intel and AMD, including also the latest Intel micro-blade servers. We present results in terms of performance, power consumption, overheads and relative cost. IntroductionThe LHCb High Level Trigger (HLT) farm [1] is a data centre which is located at P8 of the LHC accelerator. The farm consists of about 1800 industry standard computing servers. These servers are typically used in High Performance Computing (HPC) and other compute intensive applications and are optimised to give a maximum of raw CPU performance per price. Since the applications we are running are not sensitive to having a homogeneous compute cluster we are continuously adding new machines when a sufficient amount of funds become available. As a result approximately 80% of the machines are currently based on several generations of Intel's Xeon processor family, the other 20% being AMD Opteron machines.The detector is producing data at a rate between 50 to 60 GB/s and an event rate of 1 MHz. Since it was not cost effective at the time the experiment was designed to transfer this amount of data from the 100 m underground cavern to the surface, the data centre was built right next to the detector, protected by a radiation shielding wall. The downside of this is that power and cooling is limited and the cluster can only grow up to those limits.Due to additional physics requirements and the luminosity upgrade of the LHC in the upcoming Run 2, we will have to add a significant amount of compute capability. We expect to have to add an additional 70% of our current compute power to the cluster in order to keep up with the increased requirements. At the current event complexity this would correspond to an increase in trigger rate of 700 kHz. At the same time this compute capability will have to fit into a power and cooling envelope of 200 kW.In order to spend our available funds as effective as possible we have turned our trigger software into a stand alone benchmark application. This benchmark can be installed on a Live DVD and distributed to platform integrators to test various computing platforms for their efficiency and optimise trigger decision throughput vs. price and power demands. As a positive side effect this benchmark also serves as a basis for optimising machine settings to get the best out of our current machines.It should be noted here that we have decided against using HEP-SPEC since we have found discrepancies on the order of 10% b...

show abstract

Controlling a large CPU farm using industrial tools

Sambade¹,

Frank²,

Galli³

et al. 2009

2009 16th IEEE-NPSS Real Time Conference

View full text Add to dashboard Cite

Abstract-The LHCb experiment at CERN will have an Event Filter Farm (EFF) composed of 2000 CPUs. These machines will form a pool of 50 sub-farms with 30 to 40 nodes each, running a large amount of High Level Trigger (HLT) tasks in parallel. Although these tasks are identical algorithms, they can run at the same time being configured with different parameters, such as run type (Physics, Cosmics, Test, etc.) or with different subdetectors (partitions). The HLT is the second of the two trigger levels in LHCb. Its selection algorithms reduce the incoming data rate of 1 MHz to an output rate of 2 kHz. Selected events are sent for mass storage and subsequent offline reconstruction and analysis. These trigger processes running online are based on the same software framework as the algorithms for offline analysis (Gaudi). The control of the trigger farm was developed with an industrial SCADA system (PVSS) which is used throughout the Experiment Control System (ECS). The HLT algorithms are handled by the ECS like hardware devices, for instance, high voltage channels. The integration of the HLT controls in the overall ECS, which is modeled as finite state machines, will be presented.Index Terms-Experiment control system (ECS), finite state machine (FSM), gaucho, high level trigger (HLT), PVSS.

show abstract

LHCb Online event processing and filtering

Cited by 4 publications

References 1 publication

The LHCb Data Acquisition during LHC Run 1

The LHCb Data Acquisition during LHC Run 1

Performance benchmark of LHCb code on state-of-the-art x86 architectures

Controlling a large CPU farm using industrial tools

Contact Info

Product

Resources

About