The simultaneous source technique has been widely applied in seismic field acquisition and has achieved great success in the past decade. Many studies have shown that inversion-based source-separation algorithms are more robust than filtering-based methods. However, inversion-based methods depend on accurate shot times. Here, we tackle the long-standing issue of the shot times being inaccurate or unavailable during iterative inversion by proposing a joint inversion framework to simultaneously separate the blended sources and invert for the shot time. We formulate a non-linear inverse problem that contains two unknowns, i.e., the unblended data and the shot time vector, and we propose a Gauss-Newton method to iteratively invert the shot time vector given an estimate of the unblended data. Then, the estimated shot time vector is fixed for iterative source-separation following a traditional deblending framework. The two aforementioned steps are recursively implemented until they converge or reach a maximum number of iterations. We demonstrate the proposed method through several synthetic and field data examples. Results show that the proposed joint inversion framework is effective and the low-frequency component matters during the non-linear inversion of the shot time vector.