Immersive audio is crucial for immersive experiences, but the tools aiming to deliver it to the listeners are still limited to headphones and bulky loudspeaker arrays. In this paper, we discuss the possibility of using a single loudspeaker, whose emission is augmented coupling acoustic metamaterial lenses with it, to obtain sound field control and thus to deliver localized sound cues in a 2D environment around a listener. Together with the sound field control, the reduction of the form-factor of the system is obtained. In fact, the realization of personal sound zones typically require at least 13 loudspeakers, placed around the listener but metamaterials have already proven that they can contribute to obtain more compact and lighter audio systems, allowing an effective directivity control and cancelling emissions in unwanted directions and thus, contributing to deliver the intended acoustic field in the region of interest. Just like an optical objective made of only two lenses, however, it was recently found that such a system suffers from the acoustic equivalent of spherical aberration. In this work, inspired by optics, we use numerical methods to design an additional metasurface to passively correct this aberration. Perspectives of combining systems of multiple lenses to correct audio systems will be discussed.