The LHCb (Large Hadron Collider beauty) experiment is designed to study differences between particles and anti-particles as well as very rare decays in the charm and beauty sector of the standard model at the LHC. With the major upgrade done in view of Run 3, the detector will read-out all events at the full LHC bunch-crossing frequency of 40 MHz. The LHCb data acquisition system will be subject to a considerably increased data rate, reaching a peak of 40 Tb/s. The second stage of the two-stage filtering consists of more than 10000 multithreaded processes which simultaneously write output files at an aggregated band-width of 100 Gb/s. At the same time, a small number of file-moving processes will read files from the same storage to copy them over to tape-storage. This whole mechanism must run reliably over months and be able to cope with significant fluctuations. Moreover, for cost reasons, it must be built from off-the-shelf components. In this paper we describe LHCb's solution to this challenge. We show the design, present reasons for the design choices, the configuration and tuning of the adopted software solution, and present performance figures.