Cost aggregation is a highly important process in image matching tasks, which aims to disambiguate the noisy matching scores. Existing methods generally tackle this by hand-crafted or CNN-based methods, which either lack robustness to severe deformations or inherit the limitation of CNNs that fail to discriminate incorrect matches due to limited receptive fields and inadaptability. In this paper, we introduce Cost Aggregation with Transformers (CATs) to tackle this by exploring global consensus among initial correlation map with the help of some architectural designs that allow us to fully enjoy global receptive fields of self-attention mechanism. To this end, we include appearance affinity modeling, which helps to disambiguate the noisy initial correlation maps. Furthermore, we introduce some techniques, including multi-level aggregation to exploit rich semantics present at different feature levels and swapping self-attention to obtain reciprocal matching scores to act as strong regularization. Although competitive performance can be attained by CATs, it may face some limitations, i.e., high computational costs induced by the use of a standard transformer that its complexity grows with the size of spatial and feature dimensions, which restrict its applicability only at limited resolution and result in rather limited performance. To overcome this, we propose CATs++, an extension of CATs. Concretely, we introduce early convolutions prior to cost aggregation with a transformer to control the number of tokens as well as to inject some convolutional inductive bias, and propose a novel transformer architecture for both efficient and effective cost aggregation, which results in apparent performance boost and cost reduction. With the reduced costs, we manage to compose our network with a hierarchical structure to process higher-resolution inputs. With these combined, we conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods. We evaluate the proposed method on standard benchmarks, including PF-WILLOW, PF-PASCAL, and SPair-71k. Our proposed methods outperform the previous state-of-the-art methods by large margins, setting a new state-of-the-art for all the benchmarks. We also provide extensive ablation studies and analyses.