The need to process and store massive amounts of data—Big Data—is a reality. In areas such as scientific experiments, social networks management, credit card fraud detection, targeted advertisement, and financial analysis, massive amounts of information are generated and processed daily to extract valuable, summarized information. Due to its fast development cycle (i.e., less expensive to develop), mainly because of automatic memory management, and rich community resources, managed object-oriented programming languages (e.g., Java) are the first choice to develop Big Data platforms (e.g., Cassandra, Spark) on which such Big Data applications are executed.
However, automatic memory management comes at a cost. This cost is introduced by the garbage collector, which is responsible for collecting objects that are no longer being used. Although current (classic) garbage collection algorithms may be applicable to small-scale applications, these algorithms are not appropriate for large-scale Big Data environments, as they do not scale in terms of throughput and pause times.
In this work, current Big Data platforms and their memory profiles are studied to understand why classic algorithms (which are still the most commonly used) are not appropriate, and also to analyze recently proposed and relevant memory management algorithms, targeted to Big Data environments. The scalability of recent memory management algorithms is characterized in terms of throughput (improves the throughput of the application) and pause time (reduces the latency of the application) when compared to classic algorithms. The study is concluded by presenting a taxonomy of the described works and some open problems, with regard to Big Data memory management, that could be addressed in future works.