Besides technical and fundamental analysis, machine learning and sentiment analysis obtained from non-structural news and comments have been studied extensively in financial market prediction in recent years. It is still uncertain how to combine predictions from news, sentiment scores or financial data. In this study, we provide a methodology to achieve this issue. Besides the methodology, this study differs from previous studies in terms of data coverage and used models in both sentiment analysis and prediction. Our study consists of weekly predictions by ensemble learning and feature selection methods using 683 variables for stocks traded in the Borsa Istanbul 30 index. In addition, we predicted sentiment scores from news of 18 different sectors and combined both predictions with weighted normalized returns. We used Random Forests, Extreme Gradient Boosting and Light Gradient Boosting Machines of ensemble learning methods for predictions. From the parameters such as training set length, estimation methods, variable selection methods, number of variables, and the number of models in the prediction method, we took the combination that gives the best result. For sentiment scores, tests were performed using BERT, Word2Vec, XLNet and Flair methods. Then, we extracted final sentiment scores from the news. With the proposed trade system, we combined the results obtained from these financial variables and the news sentiment scores. Final results show that we achieved a better performance than both predictions made by using sentiment scores and financial data in terms of weekly return and accuracy.
The sentiment analysis of news and social media posts is a growing research area with advancements in natural language processing and deep learning techniques. Although various studies addressing the extraction of the sentiment score from news and other resources for specified stocks or a stock index, still there is a lack of an analysis of the sentiment in more specialized topics such as commodity news. In this paper, several natural language processing techniques with a varying range from statistical methods to deep learning-based methods were applied on the commodity news. Firstly, the dictionary-based methods were investigated with the most common dictionaries in financial sentiment analysis such as Loughran & McDonald and Harvard dictionaries. Then, statistical models have been applied to the commodity news with count vectorizer and TF-IDF. The compression-based NCD has been also included to test on the labeled data. To improve the results of the sentiment extraction, the news data was processed by deep learning-based state-of-art models such as ULMFit, Flair, Word2Vec, XLNet, and BERT. A comprehensive analysis of all tested models was held. The final analysis indicated the performance difference between the deep learningbased and statistical models for the sentiment analysis task on the commodity news. BERT has achieved superior performance among the deep learning models for the given data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.