This paper describes the challenges for document image analysis community for building large digital libraries with diverse document categories. The challenges are identified from the experience of the ongoing activities toward digitizing and archiving one million books. Smooth workflow has been established for archiving large quantity of books, with the help of efficient image processing algorithms. However, much more research is needed to address the challenges arising out of the diversity of the content in digital libraries.