In recent years, the availability of chatbot applications has increased substantially with the advancement of artificial intelligence techniques, and research efforts have been active in the English language, which presents state-of-the-art solutions. However, despite the popularity of the Arabic language, its research community is still in an immature stage. Therefore, the main objective of this systematic literature review is studying state-of-the-art researchfor both the English and Arabic languagesto answer the proposed research questions regarding the development approaches, application domains, evaluation metrics, and development challenges of chatbot applications. The findings show that researchers have devoted more attention to the education domain using retrieval-based approaches while the generation-based approach has grown in popularity recently for providing new responses tasks. Whereas the hybrid approach for ranking multi-possible responses of combining both previous approaches shows a performance improvement. Besides, most metrics used to evaluate chatbot performance are human-based, followed by bilingual evaluation understudy and accuracy metrics. However, defining a common framework for evaluating chatbots remains a challenge. Finally, the open problems and future directions are highlighted to help in developing chatbots with minimal human interference to simulate natural conversations.