This article reviews the history and state-of-the-art optical character recognition systems, such as ABBYY FineReader, Tesseract, CuneiForm, with particular attention given to their inner algorithms, including page layout analysis; page segmentation and document skew angle estimation. The overview includes the description and comparison of different methods proposed for the last 30 years in terms of speed and versatility. Critical analysis and discussions about the status of the field and open problems are reported.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.