Large number of embedded devices, massive volumes of data, users and applications are driving the digital world to move faster than ever. To be competitive in today's digital economy companies have to process large volumes of dynamically changing data at real-time. There are many industries from health-care, e-commerce, insurance and telecommunications with various use cases such as DNA sequencing, capturing customer insights, real-time offers, high-frequency trading, and real-time intrusion detection that have taken the use of Big Data analytics into account to make critical decisions that impact their business . On the other hand, the Internet of Things (IoT) is becoming the primary grounds for data mining and Big Data analytics . With the rapid growth of IoT and its use cases in different domains such as Smart City, Mobile e-Health and Smart Grid, streaming applications are driving a new wave of data revolutions. In most IoT applications the resulting analytics give some feedbacks to the system to improve it . Compared to the other Big Data domains, there is a low-latency cycle between system
Despite the huge number of researches in Big Data area, approximate computing in this area still remains a challenge. The approximation is used for reduction of resources such as time, cost or energy. Applications that analyze the input data, logs and queries to generate aggregated results or dashboards can benefit from approximation techniques in Big Data. In these applications, the output is much smaller than the input. This fact indicates that approximation can be used for increasing the processing performance for this kind of computation. Data skew causes reduction of performance in approximation. Data skew has many causes [1-3]. In this paper, we focused on the challenge that stems from variety. Data variety can be created by aggregating input data from multiple sources with different statistical distribution and uneven distribution of input data. This uneven distribution causes the data skew. There are many approaches that have addressed the sample-based approximation.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.