Feature fusion and similarity computation are two core problems in 3D object tracking, especially for object tracking using sparse and disordered point clouds. Feature fusion could make similarity computing more efficient by including target object information. However, most existing LiDAR-based approaches directly use the extracted point cloud feature to compute similarity while ignoring the attention changes of object regions during tracking. In this paper, we propose a feature fusion network based on transformer architecture. Benefiting from the self-attention mechanism, the transformer encoder captures the inter-and intra-relations among different regions of the point cloud. By using cross-attention, the transformer decoder fuses features and includes more target cues into the current point cloud feature to compute the region attentions, which makes the similarity computing more efficient. Based on this feature fusion network, we propose an end-to-end point cloud object tracking framework, a simple yet effective method for 3D object tracking using point clouds. Comprehensive experimental results on the KITTI dataset show that our method achieves new state-of-the-art performance. Code is available at: https://github.com/3bobo/lttr.Recently, LiDAR-based 3D object tracking has been received more and more attention. Benefiting from the development of visual tracking [1,7,13,15,16], most 3D tracking methods [11,20,30] also use the Siamese-like tracking pipeline. The pipeline first inputs template point clouds of the target object and search point clouds of the current frame to its top and bottom branches respectively, then fuses the two-branch features based on similarity. Finally, the fused features are used to localize the position of the object to be tracked. However, compared with visual tracking, LiDAR-based tracking has more challenges due to the sparsity and disorder of the point clouds. For example, the point clouds will become much sparser with the increasing distance of the object, which hinders the feature extraction. Meanwhile,