One of the key roles of Botanists is to be able to recognize flowers. This role has become highly challenging given that the number of discovered flower types are nearing half a million. To support Botanists, Information Technology offers promising solutions. Specifically, machine learning techniques are intrinsically appealing due to being precise enough as required. To this aim, two observations on flower leaves are relevant and leverage flower identification: one, flower plants exhibit unique features in their leaves thus allow distinction of their co-located flowers; two, leaves have a much longer life than flowers thus preserve identity properties longer. This paper proposes the use of machine learning-based identification of rose types by leveraging the features from their leaves. For this purpose, the performance of Naive Bayes, Generalized Linear Model, Multilayer Perceptron, Decision Tree, Random Forest, Gradient Boosted Trees, and Support Vector Machine has been analyzed. This study optimizes the RF model by investigating and tuning its various parameters such as the number of trees, the depth of trees, and splitting criteria. The best results are achieved with gain ratio because it takes more distinct values to avoid the problems associated with Information Gain. Optimizing the number of trees and the depth of trees of RF yield better accuracy than other models. Extensive experiments are performed to analyze the results of ensemble algorithms by using the voting method for each instance. Results suggest that the performance of ensemble classifiers is superior to that of individual models.
People’s lives are influenced by social media. It is an essential source for sharing news, awareness, detecting events, people’s interests, etc. Social media covers a wide range of topics and events to be discussed. Extensive work has been published to capture the interesting events and insights from datasets. Many techniques are presented to detect events from social media networks like Twitter. In text mining, most of the work is done on a specific dataset, and there is the need to present some new datasets to analyse the performance and generic nature of Topic Detection and Tracking methods. Therefore, this paper publishes a dataset of real-life event, the Oscars 2018, gathered from Twitter and makes a comparison of soft frequent pattern mining (SFPM), singular value decomposition and k-means (K-SVD), feature-pivot (Feat-p), document-pivot (Doc-p), and latent Dirichlet allocation (LDA). The dataset contains 2,160,738 tweets collected using some seed words. Only English tweets are considered. All of the methods applied in this paper are unsupervised. This area needs to be explored on different datasets. The Oscars 2018 is evaluated using keyword precision (K-Prec), keyword recall (K-Rec), and topic recall (T-Rec) for detecting events of greater interest. The highest K-Prec, K-Rec, and T-Rec were achieved by SFPM, but they started to decrease as the number of clusters increased. The lowest performance was achieved by Feat-p in terms of all three metrics. Experiments on the Oscars 2018 dataset demonstrated that all the methods are generic in nature and produce meaningful clusters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.