In the past decade, the rapid spread of large volumes of online information among an increasing number of social network users is observed. It is a phenomenon that has often been exploited by malicious users and entities, which forge, distribute, and reproduce fake news and propaganda. In this paper, we present a novel approach to the automatic detection of fake news on Twitter that involves (a) pairwise text input, (b) a novel deep neural network learning architecture that allows for flexible input fusion at various network layers, and (c) various input modes, like word embeddings and both linguistic and network account features. Furthermore, tweets are innovatively separated into news headers and news text, and an extensive experimental setup performs classification tests using both. Our main results show high overall accuracy performance in fake news detection. The proposed deep learning architecture outperforms the state-of-the-art classifiers, while using fewer features and embeddings from the tweet text.
Mining social web text has been at the heart of the Natural Language Processing and Data Mining research community in the last 15 years. Though most of the reported work is on widely spoken languages, such as English, the significance of approaches that deal with less commonly spoken languages, such as Greek, is evident for reasons of preserving and documenting minority languages, cultural and ethnic diversity, and identifying intercultural similarities and differences. The present work aims at identifying, documenting and comparing social text data sets, as well as mining techniques and applications on social web text that target Modern Greek, focusing on the arising challenges and the potential for future research in the specific less widely spoken language.
Due to the evolution of cyberattacks, the need to deploy modern cybersecurity learning environments is constantly increasing. Furthermore, the security challenges of the digital era are inevitably connected to the human factor, which urges personnel to become familiar with state-of-the-art security software for mitigating and detecting cyberattacks. Cybersecurity learning environments include, among others, virtual labs, Cyber Ranges and Capture the Flag (CTF) challenges, providing interactive digital learning environments for trainees to engage in complex cybersecurity scenarios. However, scenario design is not always governed by specific design principles, and the learning outcomes of the cybersecurity exercises are not always clearly defined. In this paper, a cybersecurity educational framework was used to guide the design of cybersecurity scenarios and a Cyber Range to host the scenarios was developed as a proof-of-concept. This work suggests that educators could use the proposed design methodology to design cybersecurity learning environments, including, but not limited to, Cyber Ranges.
Highly-skilled migrants and refugees finding employment in low-skill vocations, despite professional qualifications and educational background, has become a global tendency, mainly due to the language barrier. Employment prospects for displaced communities are mostly decided by their knowledge of the sublanguage of the vocational domain they are interested in working. Common vocational domains include agriculture, cooking, crafting, construction, and hospitality. The increasing amount of user-generated content in wikis and social networks provides a valuable source of data for data mining, Natural Language Processing and machine learning applications. This paper extends the contribution of the authors’ previous research on automatic vocational domain identification by further analyzing the results of the machine learning experiments with the domain-specific textual data set, considering 2 research directions: a. predictions analysis and b. data balancing. Wrong predictions analysis and the features that contributed to misclassification, along with correct predictions analysis and the features that were the most dominant, contributed to the identification of a primary set of terms for the vocational domains. Data balancing techniques were applied on the data set to observe their impact on the performance of the classification model. A novel 4-step methodology is proposed in this paper for the first time, consisting of successive applications of SMOTE oversampling on imbalanced data. Data oversampling obtains better results than data undersampling in imbalanced data sets, while hybrid approaches perform reasonably well.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.