In this paper, we introduce the STN-Homography model to directly estimate the homography matrix between image pair. Different most CNN-based homography estimation methods which use an alternative 4-point homography parameterization, we use prove that, after coordinate normalization, the variance of elements of coordinate normalized 3 × 3 homography matrix is very small and suitable to be regressed well with CNN. Based on proposed STN-Homography, we use a hierarchical architecture which stacks several STN-Homography models and successively reduce the estimation error. Effectiveness of the proposed method is shown through experiments on MSCOCO dataset, in which it significantly outperforms the state-of-the-art. The average processing time of our hierarchical STN-Homography with 1 stage is only 4.87 ms on the GPU, and the processing time for hierarchical STN-Homography with 3 stages is 17.85 ms. The code will soon be open sourced.Keywords Homography · STN · CNN Recently, some attempts have been made to tackle the homography estimation with CNN, and acquired higher accuracy than the ORB+RANSAC method. HomographyNet [17] defined the homography between two images by relocation of a set of 4 points, also known as 4-point homography parameterization. Their model is based on the VGG's architecture [18] with 8 convolutional layers, a pooling layer after every 2 convolutions, and 2 fully connected layers with an L2 loss