The Box Counting algorithm is a well-known method for the computation of the fractal dimension of an image. It is often implemented using a recursive subdivision of the image into a set of regular tiles or boxes. Parallel implementations often try to map the boxes to different compute units, and combine the results to get the total number of boxes intersecting a shape. This paper presents a novel and highly efficient method using OpenCL kernels to perform the computation on a per-pixel basis. The mapping and reduction stages are performed in a single pass, and therefore require the enqueuing of only a single kernel. Each instance of the kernel updates the information pertaining to all the boxes containing the pixel, and simultaneously increments the box counters at multiple levels, thereby eliminating the need for another pass to perform the summation. The complete implementation and coding details of the proposed method are outlined. The performance of the method on different processors are analysed with respect to varying image sizes.