Big data is a celebrated topic in Business as well as research community for several years. With the revolution of Big Data, it is becoming easy and less expensive to store tremendous amount of data for future analysis. Weather data gets accumulated very fast and on a large scale. Thorough analysis and research is required on handling this big data and utilizing it for accurate weather prediction. As deterministic weather forecasting models are usually time consuming, it becomes challenging to efficiently use this large volume of data in hand. Machine learning methods are already proved to be good replacement for traditional deterministic approaches in weather prediction. These algorithms are popular for their scalability and hence more suitable in big data solutions. This paper proposes an approach of processing such Big volume of weather Data using Hadoop. Proposal includes Artificial Neural Network implemented on Map-reduce framework for short term rainfall prediction. Rainfall is predicted one day ahead using temperature and rainfall data of immediately preceding days. Temperature and Rainfall data of India over past 63 years is used for this study.
This article presents a stream mining framework to cluster the data stream and monitor its evolution. Even though concept drift is expected to be present in data streams, explicit drift detection is rarely done in stream clustering algorithms. The proposed framework is capable of explicit concept drift detection and cluster evolution analysis. Concept drift is caused by the changes in data distribution over time. Relationship between concept drift and the occurrence of physical events has been studied by applying the framework on the weather data stream. Experiments led to the conclusion that the concept drift accompanied by a change in the number of clusters indicates a significant weather event. This kind of online monitoring and its results can be utilized in weather forecasting systems in various ways. Weather data streams produced by automatic weather stations (AWS) are used to conduct this study.
In the case of real-world data streams, the underlying data distribution will not be static; it is subject to variation over time, which is known as the primary reason for concept drift. Concept drift poses severe problems to the accuracy of a model in online learning scenarios. The recurring concept is a particular case of concept drift where the concepts already seen in the past reappear as the stream evolves. This problem is not yet studied in the context of stream clustering. This paper proposes a novel algorithm for identifying the recurring concepts in data stream clustering. During concept recurrence, the most matching model is retrieved from the repository and reused. The algorithm has minimum memory requirements and works online with the stream. Some of the concepts and definitions, already familiar in concept recurrence studies of stream classification have been redefined for clustering. The experiments conducted on real and synthetic data streams reveal that the proposed algorithm has the potential to identify recurring concepts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.