We present a method that skeletonizes the first arriving seismic refractions by machine learning and inverts them for the subsurface velocity model. In this study, first arrivals can be compressed in a low-rank sense with their skeletal features extracted by a well-trained autoencoder. Empirical experiments suggest that the autoencoder's 1 × 1 or 2 × 1 latent vectors vary continuously with respect to the input seismic data. It is, therefore, reasonable to introduce a misfit functional measuring the discrepancies between the predicted and the observed data in a low-dimensional latent space. The benefit of this approach is that an elaborated autoencoding neural network not only refines intrinsic information hidden in the refractions but also improves the quality of inversion for a reliable background velocity model. Numerical tests on both synthetic and field data demonstrate the effectiveness of this method, especially in recovering the low-to-intermediate wavenumber parts of the subsurface velocity distribution. Comparisons are made with the other three relevant methods, the wave-equation travel-time (WT) inversion, the envelope inversion, and the full waveform inversion (FWI). As expected, the cycle skipping problem is alleviated due to the reduction of dimensions of data space. This method outperforms the envelope inversion in resolution, and it is no worse than WT. Moreover, there is no need for careful manual travel-time picking with this methodology. In general, this inversion framework provides an extendable strategy to compress any input data for reconstructing high-dimensional physical parameters.