Drug combination therapies are superior to monotherapy for cancer treatment in many ways when addressing tumor heterogeneity issue. For wet-lab experiment, screening out novel synergistic drug pairs is challenging due to the enormous searching space of possible drug pairs. Thus, computational methods have been developed to predict drug pairs with potential synergistic function. Notwithstanding the success of current models, the power of generalization to other datasets as wells as understanding of mechanism for chemical-chemical interaction or chemical-sample interaction are lack of study, hindering current algorithms from real application. In this paper, we proposed a deep neural model termed DTSyn (Dual Transformer model for drug pair Synergy prediction) based on multi-head attention mechanism to identify novel drug combinations. We designed a fine-granularity transformer for capturing chemical substructure-gene and gene-gene associations and a coarse-granularity transformer for extracting and chemical-chemical and chemical-cell line interactions. DTSyn achieves highest Receiver operating characteristic area under curve (ROC AUC) of 0.73, 0.78. 0.82 and 0.81 on four different cross validation tasks, outperforming all competing methods. Further, DTSyn achieved best True Positive Rate (TPR) over five independent datasets. The ablation study showed that both transformer blocks contributed to the performance of DTSyn. In addition, DTSyn can extract interactions among chemicals and cell lines, which may represent the mechanisms of drug action. Thus, we envision our model a valuable tool to prioritize synergistic drug pairs by utilizing chemicals and transcriptome data.