During exploratory performance testing, software testers evaluate the performance of a software system with different input combinations in order to identify combinations that cause performance problems in the system under test. Performance problems such as low throughput, high response times, hangs, or crashes in software applications have an adverse effect on the customer's satisfaction. Since many of today's largescale, complex software systems (e.g., eCommerce applications, databases, web servers) exhibit very large multi-dimensional input spaces with many input parameters and large ranges, it has become costly and inefficient to explore all possible combinations of inputs in order to detect performance problems. In order to address this issue, we introduce a method for identifying input combinations that trigger performance problems in the software system under test. Our method, under the name of iPerfXRL, employs deep reinforcement learning in order to explore a given large multi-dimensional input space efficiently. The main benefit of the approach is that, during the exploration process, it learns and recognizes the problematic regions of the input space that have a higher chance of triggering performance problems. It concentrates the search in those problematic regions to find as many input combinations as possible that can trigger performance problems while executing a limited number of input combinations against the system. In addition, our approach does not require prior domain knowledge or access to the source code of the system. Therefore, it can be applied to any software system where we can interactively execute different input combinations while monitoring their performance impact on the system. We implement iPerfXRL on top of the Soft Actor-Critic algorithm. We evaluate empirically the efficiency and effectiveness of our approach against alternative state-of-the-art approaches. Our results show that iPerfXRL accurately identifies the problematic regions of the input space and finds up to 9 times more input combinations that trigger performance problems on the system under test than the alternative approaches. INDEX TERMS Exploratory performance testing, Deep reinforcement learning, Test data generation I. INTRODUCTION One of the most critical and challenging tasks for developers is to identify and fix performance problems of software systems [1]. Performance problems such as low throughput, high response times, hangs, or crashes in software applications have an adverse effect on the customer's satisfaction. According to [2], there are higher chances of a software system crashing due to performance problems rather than functional failures. Recent reports [1] show that certain input combinations can trigger more than half of the performance bottlenecks identified in non-trivial software systems. The reason is that certain input combinations can invoke inefficient code sequences or resource-intensive operations, which result in overall system performance degradation commonly referred to as performan...