Spatial filtering with microphone arrays is a technique that can be utilized to obtain the signal of a target sound source from a specific direction. Typical approaches in the field of audio underperform in practical environments with multiple sound sources and diffuse sound. In this contribution we propose a post-filtering technique to suppress the effect of interferers and diffuse sound. The proposed technique utilizes the cross-spectral estimates of the output of two beamformers to formulate a timefrequency soft masker. The beamformers' outputs are used only for parameter estimation and not for generating an audio signal. Two sets of beamformer weights, a constant and an adaptive, are applied to the microphone array signals for the parameter estimation. The weights of the constant beamformer are designed such that they provide a spatially narrow beam pattern that is time and frequency invariant, having a unity gain towards the direction of interest. The weights of the adaptive beamformer are formulated using linearly constrained optimization with the constraint of weighted orthogonality with respect to the constant beamformer weights, as well as the unity gain towards the look direction. The orthogonality constraint provides diffuse sound suppression while the unity gain distortionless response. The cross spectrum of these two beamformers provides the target energy at a given look direction for the post filter. The study focuses on compact microphone arrays with which the typical beamforming techniques feature a trade-off between noise amplification and spatial selectivity, especially in the low frequency region. The proposed method is evaluated with instrumental measures and listening tests under different reverberation times, in dual and multi-talker scenarios. The evaluation shows that the proposed method provides a better performance when compared with a previous state-of-the-art spatial filter based on cross-pattern coherence, a linearly constrained beamformer and a Wiener postfilter.