Seismic wavelet extraction is a fundamental but crucial component in seismic data analysis, which aims at estimating the source wavelet for better decoding the subsurface reflectivities from seismic signals and calibrating the measurements between logging and seismic. Due to the rock heterogeneity, however, seismic energy attenuates during its propagation, and correspondingly the seismic wavelet is considered varying from shallow to deep as well as from near to far offsets, causing the wavelet time-variant. While machine learning (ML) appears feasible for assisting the challenge, without a comprehensive understanding of the seismic propagation, representative training labels cannot be prepared, and supervised learning appears less applicable. This study proposes implementing self-supervised learning into time-variant seismic wavelet extraction. Specifically, the proposed network is in the architecture of a dual-task auto-encoder (DTAE). Starting from 1-D seismic amplitude, the DTAE first uses an encoder to extract a set of features at multiple scales, which are then split into two flows, with one through a decoder that aims at reconstructing the input 1D seismic signal and the other through a few convolutional layers that aim at matching the spectrum between seismic trace and extracted wavelet. Correspondingly, the objective function of training such DTAE consists of two parts, the mean-square-error of the amplitude reconstruction and the mean-square-error of the spectrum matching. The proposed workflow is tested on various field datasets, and compared to the statistical wavelets, the extracted wavelets are of better match with the spectrum variation of seismic from shallow to deep and from near offset to far offset.