Abstract-Learning from preferences, which provide means for expressing a subject's desires, constitutes an important topic in machine learning research. This paper presents a comparative study of four alternative instance preference learning algorithms (both linear and nonlinear). The case study investigated is to learn to predict the expressed entertainment preferences of children when playing physical games built on their personalized playing features (entertainment modeling). Two of the approaches are derived from the literature-the large-margin algorithm (LMA) and preference learning with Gaussian processes-while the remaining two are custom-designed approaches for the problem under investigation: meta-LMA and neuroevolution. Preference learning techniques are combined with feature set selection methods permitting the construction of effective preference models, given suitable individual playing features. The underlying preference model that best reflects children preferences is obtained through neuroevolution: 82.22% of cross-validation accuracy in predicting reported entertainment in the main set of game survey experimentation. The model is able to correctly match expressed preferences in 66.66% of cases on previously unseen data (p-value = 0.0136) of a second physical activity control experiment. Results indicate the benefit of the use of neuroevolution and sequential forward selection for the investigated complex case study of cognitive modeling in physical games.
Cyberbullying is a new phenomenon resulting from the advance of new communication technologies including the Internet, cell phones and Personal Digital Assistants. It is a challenging bullying problem occurring in a new territory. Online bullying can be particularly damaging and upsetting because it's usually anonymous or hard to trace. In this paper, the proposed method is utilizing a dataset of real world conversations (i.e. pairs of questions and answers between cyber predator and the victim), in which each predator question is manually annotated in terms of severity using a numeric label. We approach the issue as a sequential data modelling approach, in which the predator's questions are formulated using a Singular Value Decomposition representation. The motivation of this procedure is to study the accuracy of predicting the level of cyberbullying attack using classification methods and also to examine potential patterns between the lingustic style of each predator. More specifically, unlike previous approaches that consider a fixed window of a cyber-predator's questions within a dialogue, we exploit the whole question set and model it as a signal, whose magnitude depends on the degree of bullying content. Using feature weighting and dimensionality reduction techniques, each signal is straightforwardly parsed by a neural network that forecasts the level of insult within a question given a window between two and three previous questions. Throughout the time series modeling experiments, an interesting discovery was made. By applying SVD on the time series data and taking into account the second dimension (since the first is usually modeling trivial dependencies between instances and attributes) we observed that its plot was very similar to the plot of the class attribute. By applying a Dynamic Time Warping algorithm, the similarity of the aforementioned signals was proved to exist, providing an immediate indicator for the severity of cyberbullying within a given dialogue.
Abstract. Electronic Participation (eParticipation), both in its traditional form and in its emerging Web 2.0 based form, results in the production of large quantities of textual contributions of citizens concerning government policies and decisions under formation, which contain valuable relevant opinions and knowledge of the society, however are exploited to a limited only extent. It is of critical importance to analyze these contributions in order to extract the opinions and knowledge they contain in a cost-efficient way. This paper reviews a wide range of opinion mining methods, which have been developed for analyzing commercial product opinions and reviews posted on the Web, as to the capabilities they can offer for meeting the above challenges. The review has revealed the great potential of these methods for the analysis of textual citizens' contributions in public policy debates, both for assessing contributors' general attitudes-sentiments (positive, negative or neutral) towards the policy/decision under discussion, and also for extracting the main issues they raise (e.g. negative and positive aspects and effects, implementation barriers, improvement suggestions) and the corresponding attitudes-sentiments. Based on the conclusions of this review a basic framework for the use of opinion mining methods in eParticipation has been formulated.
Sentiment analysis has played a primary role in text classification. It is an undoubted fact that some years ago, textual information was spreading in manageable rates; however, nowadays, such information has overcome even the most ambiguous expectations and constantly grows within seconds. It is therefore quite complex to cope with the vast amount of textual data particularly if we also take the incremental production speed into account. Social media, e-commerce, news articles, comments and opinions are broadcasted on a daily basis. A rational solution, in order to handle the abundance of data, would be to build automated information processing systems, for analyzing and extracting meaningful patterns from text. The present paper focuses on sentiment analysis applied in Greek texts. Thus far, there is no wide availability of natural language processing tools for Modern Greek. Hence, a thorough analysis of Greek, from the lexical to the syntactical level, is difficult to perform. This paper attempts a different approach, based on the proven capabilities of gradient boosting, a well-known technique for dealing with high-dimensional data. The main rationale is that since English has dominated the area of preprocessing tools and there are also quite reliable translation services, we could exploit them to transform Greek tokens into English, thus assuring the precision of the translation, since the translation of large texts is not always reliable and meaningful. The new feature set of English tokens is augmented with the original set of Greek, consequently producing a high dimensional dataset that poses certain difficulties for any traditional classifier. Accordingly, we apply gradient boosting machines, an ensemble algorithm that can learn with different loss functions providing the ability to work efficiently with high dimensional data. Moreover, for the task at hand, we deal with a class imbalance issues since the distribution of sentiments in real-world applications often displays issues of inequality. For example, in political forums or electronic discussions about immigration or religion, negative comments overwhelm the positive ones. The class imbalance problem was confronted using a hybrid technique that performs a variation of under-sampling the majority class and over-sampling the minority class, respectively. Experimental results, considering different settings, such as translation of tokens against translation of sentences, consideration of limited Greek text preprocessing and omission of the translation phase, demonstrated that the proposed gradient boosting framework can effectively cope with both high-dimensional and imbalanced datasets and performs significantly better than a plethora of traditional machine learning classification approaches in terms of precision and recall measures.
Abstract. In the last decade there is extensive and continuously growing creation of political content in the Internet, and especially in the Web 2.0 social media, which can be quite useful for government agencies in order to understand the needs and problems of societies and formulate effective public policies for addressing them. So a variety of ICT-based methods have been developed for the exploitation of this political content by governments ('citizensourcing'), initially simpler and later more sophisticated ones. These ICT-based methods are increasingly based on the use of opinion mining (OM) and sentiment analysis (SA) techniques, in order to process the extensive political content collected from numerous sources. This paper describes a novel approach to OM and SA use, created as part of an advanced ICT-based method of exploiting political content created in the Internet, and especially in social media, by experts ('expertsourcing'), aiming to leverage the extensive policy community of the European Union, which is developed in the European EU-Community project. Furthermore, some first experimental results of it are presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.