In current era of technology, information acquisition from images and videos become most important task due to the rapid development of data mining and machine learning.The information can be either textual, visual, or combination of these. Text appearing in images or videos is a significant source of information and plays a vital role to perceive it. Developing a unified method to detect the text is hard, as textual properties (i.e. font, size, color, illumination, orientation, etc.) may vary with the complex background. So far, multimedia and computer vision community unable yet to standardize any ideal