Bearings are key components of mechanical equipment, and fault diagnosis is a necessary and important measure to ensure bearing safety. Driven by industrial big data and deep learning (DL), intelligent fault diagnosis (IFD) has made great progress in recent years. However, most of the existing methods mainly focus on the fault diagnosis of individual bearings, and the feature extraction and fault classification rely on traditional networks and expert experience, which cannot meet the diagnostic requirements of cross-bearing conditions. To fill this research gap, this paper proposes a multi-scale attention-based transfer model (MSATM). First, the collected vibration signals are converted into time–frequency maps as samples, and the proposed MSATM employs multi-scale residual learning and attention mechanism to adaptively extract sensitive fault features, and recognizes faults of new bearings by deep transfer learning using the trained MSATM. A large number of experimental results based on a bearing benchmark validate the effectiveness and superiority of the proposed method and provide a promising tool for cross-bearing fault diagnosis.