Time-Series clustering is one of the important concepts of data mining that is used to gain insight into the mechanism that generate the time-series and predicting the future values of the given time-series. Time-series data are frequently very large and elements of these kinds of data have temporal ordering. The clustering of time series is organized into three groups depending upon whether they work directly on raw data either in frequency or time domain, indirectly with the features extracted from the raw data or with model built from raw data. In this paper, we have shown the survey and summarization of previous work that investigated the clustering of time series in various application domains ranging from science, engineering, business, finance, economic, health care, to government.
Ensemble classifiers are widely used for the enhancement of accuracy of twitter sentiment classification. In the present research, a hybrid model based on stack based ensemble classifiers and dictionary based classifier is used for tweet classification as positive and negative. To enhance accuracy of classification, sentiment score retrieved from dictionary based classifier is added to the feature vector to get enhanced feature set and the hybrid stack based ensemble model is implemented on this enhanced feature set. Three machine learning classifiers svmRadial, C5.0, NB are used to build stacked based ensemble classifier using GLM and RF as Meta learners. Three data sets viz. Kaggle -US Airline Twitter Sentiment Data Set, Sentiment 140 Twitter Data Set, and Real time manually labeled data set related to 'Clean India Mission' are used for the implementation of the proposed model. Caret library of R Studio is used for creating the stack based ensemble of classifiers. The results show that the proposed hybrid model that used sentiment score as one of the features in feature set performed better with an accuracy of 0.8742223 for Kaggle -US Airline Twitter Sentiment Data Set, 0.8881453 for data set related to 'Clean India Mission' and 0.9953593 for Sentiment 140 Twitter Data Set, as compared to machine learning classifiers and other ensemble classifiers.
Time series databases consist of sequences of values that are calculated or retrieved at regular intervals of time. The values or the events are measured at equal intervals of time (hourly, weekly, yearly etc). Time series data is very large in size having high dimensionality and is updated at regular intervals of time. Mining of the time series data is considered as an important analysis approach in large number of application areas like medicine, fraud detection, stock market etc. There has been a huge amount of research going in the field of time series data mining.Index Terms-computer aided design, historical climate network, summary of day, accounting-fraud detection.
<span>Twitter is used by millions of people across the world, so the data collected from Twitter can be highly valuable for research and helpful in decision support. Here in this paper ‘Twitter US Airline data’ from Kaggle data repository is used for sentiment classification of customers’ reviews. The current research aims to implement various machine learning classifiers, Stack-based ensemble classifiers and hybrid of lexicon classifier with other classifiers. 11 different classification models are implemented for different sized feature sets. Also, all the 11 models are re-implemented by adding sentiment score of lexicon based classifier as one of the features in the feature set. Results are analyzed by varying number of input feature variables used in the classification. Four different size feature sets having 301,501, 701, and 1301 number of features are used to analyze the variations in the final findings. Chi-Square and Information gain techniques are used for feature selection. The results show that an increase in the number of features increases the accuracy up to 701 features. After that, accuracy is stable or decreases with increase in feature set size. Also, the cost of adding sentiment score of lexicon classifier to the input feature set is nominal, but the results are improved consistently. WEKA and R Studio tools are used for analysis and implementation. Accuracy and Kappa are used for representing and comparing the efficiency of models.</span>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.