The image and video processing algorithms are currently crucial for many applications. Hardware implementation of these algorithms provides higher speed for large computation applications. Besides, noise removing is often a typical pre-processing step to enhance the results of later analysis and processing. Median filter is a typical nonlinear filter that is very commonly used for impulse noise elimination in digital image processing. This paper suggests a low-energy median filter hardware design for battery based hardware applications. Approximate solution with high accuracy is investigated to speed up the filtering operation, reduce the area, and consume less power/energy. Pipelining and parallelism are used to optimize the speed and power of this technique. Non-pipelined, two different pipelined structures, and two parallel architectures versions are designed. The design versions are implemented firstly with a Virtex-5 LX110T FPGA then using the UMC 130nm standard cells ASIC technology. The selection and the even-odd sorting-based median filters are also implemented for an equitable comparison with the standard median filtering techniques. The suggested non-pipelined median filter design enhances the throughput by 35% than the highest investigated state-of-the-art. The pipelining enhances the throughput to more than twice its value. Additionally, the parallel architecture decreases the area and the consumed power by around 40%. The simulation results reveal that one of the suggested designs significantly decreases the area, with the same speed as the fastest design in literature without noticeable degrading the accuracy, and a significant decrease in energy consumption by about 60%.