Table detection, extraction and annotation have been an important research problem for years. To handle this issue, different approaches have been designed for different types of documents. Among these PDF is a widely used format for preserving and presenting different types of documents. We investigate the state of the art in table detection, extraction and annotation in PDF documents. Because of varying table structural anatomy, the state of the art in table-related research enumerates a number of approaches that are critically and analytically investigated for identifying their strengths and limitations as well as for making recommendations for further improvement. An evaluation framework is contributed that compares different information extraction tools that may be used in table detection, extraction and annotation. We found very limited attention towards these aspects in books, especially books in PDF format. There is no searching solution that can find books having tables that are semantically related to a table in a given book.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.