In the era of the Internet of Things (IoT) and Industry 4.0, the indoor usage of smart devices is expected to increase, thereby making their location information more important. Based on various practical issues related to large delays, high design cost, and limited performance, conventional localization techniques are not practical for indoor IoT applications. In recent years, many researchers have proposed a wide range of machine learning (ML)-based indoor localization approaches using Wi-Fi received signal strength indicator (RSSI) fingerprints. This survey attempts to provide a summarized investigation of MLbased Wi-Fi RSSI fingerprinting schemes, including data preprocessing, data augmentation, ML prediction models for indoor localization, and postprocessing in ML, and compare their performance. Any ML-based study is heavily reliant on datasets. Therefore, we dedicate a significant portion of this survey to the discussion of dataset collection and open-source datasets. To provide good direction for future research, we discuss the current challenges and potential solutions related to ML-based indoor localization systems.