Rationale and Objectives: Large Language Models (LLMs) have the potential to enhance medical training, education, and diagnosis. However, since these models were not originally designed for medical purposes, there are concerns regarding their reliability and safety in clinical settings. This review systematically assesses the utility, advantages, and potential risks of employing LLMs in the field of hematology. Materials and Methods: We searched PubMed, Web of Science, and Scopus databases for original publications on LLMs application in hematology. We limited the search to articles published in English from December 01 2022 to March 25, 2024, coinciding with the introduction of ChatGPT. To evaluate the risk of bias, we used the adapted version of the Quality Assessment of Diagnostic Accuracy Studies criteria (QUADAS-2). Results: Eleven studies fulfilled the eligibility criteria. The studies varied in their goals and methods, covering medical education, diagnosis, and clinical practice. GPT-3.5 and GPT-4's demonstrated superior performance in diagnostic tasks and medical information propagation compared to other models like Google's Bard (currently called Gemini). GPT-4 demonstrated particularly high accuracy in tasks such as interpreting hematology cases and diagnosing hemoglobinopathy, with performance metrics of 76% diagnostic accuracy and 88% accuracy in identifying normal blood cells. However, the study also revealed discrepancies in model consistency and the accuracy of provided references, indicating variability in their reliability. Conclusion: While LLMs present significant opportunities for advancing clinical hematology, their incorporation into medical practice requires careful evaluation of their benefits and limitations. Key Words: Hematology; Large Language Models; ChatGPT; Microsoft Bing; Google Bard; PaLM; LlaMA.