Baleen whales produce a wide variety of frequency-modulated calls. Extraction of the time–frequency (TF) structures of these calls forms the basis for many applications, including abundance estimation and species recognition. Typical methods to extract the contours of whale calls from a spectrogram are based on the short-time Fourier transform and are, thus, restricted by a fixed TF resolution. Considering the low-frequency nature of baleen whale calls, this work represents the contours using a pseudo-Wigner–Ville distribution for a higher TF resolution at the cost of introducing cross terms. An adaptive threshold is proposed followed by a modified Gaussian mixture probability hypothesis density filter to extract the contours. Finally, the artificial contours, which are caused by the cross terms, can be removed in post-processing. Simulations were conducted to explore how the signal-to-noise ratio influences the performance of the proposed method. Then, in experiments based on real data, the contours of the calls of three kinds of baleen whales were extracted in a highly accurate manner (with mean deviations of 5.4 and 0.051 Hz from the ground-truth contours at sampling rates of 4000 and 100 Hz, respectively) with a recall of 75% and a precision of 78.5%.