This article analyzes performance of various 2D wavelet transform implementations based on the lifting scheme for the CDF 9/7 filterbank with respect to the modern and most recent CPU architectures. We propose three methods combining different approaches to memory locality handling and parallelism, which is obtained using Intel Threading Building Blocks developer libraries. Implementations were tested on a wide range of Intel-based personal computers.Index Terms-fast lifting wavelet transform, parallel optimization, threading building blocks, 4D layout I. INTRODUCTION Two-dimensional wavelet transform is an important tool in many image processing applications. Speaking of digital still image lossy compression, the Cohen-Daubechies-Feauveau 9/7 wavelet (CDF 9/7) is considered as the best solution concerning the classic dyadic approach [10] and hence it is used in JPEG2000 standard. Regarding the length of the filters, the standard computational approach that uses convolution with FIR filter bank structures wastes too many computational and memory resources. Another mathematical formulation called lifting-based wavelet transform and its efficient factorization has been proposed [5], requiring far fewer computations to obtain the same result as the standard algorithm. The Fast lifting wavelet transform is based on breaking the original filters to a series of so-called lifting steps, a sequence of upper and lower triangular matrices, which are applied to even and odd parts of the original signal. The algorithm is processed in-place, requiring no extra memory for each step, with the exception of temporary buffers used for coefficient ordering. Moreover, due to the reduced complexity of the steps, computational time can be reduced as far as 50 % of the original convolution approach duration, as documented by [8].Many specific implementations of the fast lifting transform algorithm are described in literature [1], [2]. In this article we focus on modern CPU architectures. Most recent processors tend to have large caching facilities and often employ methods of parallelism, and we try to exploit these facts by proposing some specific approaches of the transform.