Many applications generate and handle very large volumes of data, like social networking, cloud applications, public web sites, search engines, scientific simulations, data warehouse and so on. These large volumes of data are coming from a variety of sources and are both unstructured and structured data. In order to transform efficiently this massive data of various types into valuable information and meaningful knowledge, we need large-scale cluster infrastructures. In this context, one challenging problem is to realize an effective resource management of these large-scale cluster infrastructures in order to run distributed data analytics.