Most existing air target intention recognition methods use only single-moment information, risking failure when acquiring data containing noise and many outliers. The robustness of methods that utilize continuous moment information has yet to be explored. This paper designs a robust recognition method for air target intention to address the above problems. The method takes data with noise and outliers as the object, based on a parallel time-channel Transformer Encoder and a weight self-learning unit. First, a detailed introduction to air target intention recognition and robust recognition is given, and the intention space and feature space are defined. Subsequently, the data samples are reconstructed using a fixed-step sliding window to increase the information utilized with multi-moment information as input. Finally, step-wise and channelwise correlations are extracted using a time-axis Transformer Encoder and a channel-axis Transformer Encoder, respectively, and the weights of the two branches are automatically learnt using a weight selflearning unit. This enhanced self-attention network allocates attention weights between elements in the time and channel domain sequences to capture their long-range and short-range relationships and extract recognizable representations, making it robust to outliers and noise. The experimental results show that the model's recognition accuracy and composite F1 score reach 96.9% and 0.9676, and its performance remains well when the noise level and outliers proportion increase. The ablation and comparison experiments show its advantage in accuracy over other models.