Amid ongoing health crisis, there is a growing necessity to discern possible signs of Wellness Dimensions (WD) 1 manifested in self-narrated text. As the distribution of WD on social media data is intrinsically imbalanced, we experiment the generative NLP models for data augmentation to enable further improvement in the prescreening task of classifying WD. To this end, we propose a simple yet effective data augmentation approach through promptbased Generative NLP models, and evaluate the ROUGE scores and syntactic/semantic similarity among existing interpretations and augmented data. Our approach with ChatGPT model surpasses all the other methods and achieves improvement over baselines such as Easy-Data Augmentation and Backtranslation. Introducing data augmentation to generate more training samples and balanced dataset, results in the improved F-score and the Matthew's Correlation Coefficient for upto 13.11% and 15.95%, respectively.
With the popularity of Online Social Networks (OSN), the number of different types of digital attacks has been increased causing lots of damages to their users. Identity Clone Attack (ICA) is one of the leading among them which illegally uses the information of a genuine user by duplicating it in another fake profile. These attacks severely affect a true and innocent identity since it can be misused by another malicious profile. Hence these clone profiles need to be identified and removed in order to increase the protection of users. This study introduces a model to detect clone profiles on Facebook by using a dataset that consists of profiles with attributes and network connections. Though the initial dataset was real, it was modified to make some artificial clones. The process of detection included three main stages, namely, filter by name, cluster using weighted categorical attributes and measure the strength of friend relationships among profiles, which follow one after another respectively. Finally, the list of possible clones with their percentages representing the amount of duplicability to a given victim profile was presented as the output of the model. Instead suggesting the exact clones, the representation of this duplicability percentages makes this approach more practical since there are many similar profiles but not clones. With the use of Agglomerative hierarchical clustering algorithm and Jaccard similarity measurement, a low average within cluster distance in cluster density performance and a precision of 88.75% has shown in the results. The present study highly focuses on the distribution of the dataset, where the calculation of weights for the attributes, similarity threshold and even the selection of the clustering algorithm is done based on it and this increases the adjustability of the proposed model to any other dataset. As the future improvements, this newly proposed approach can be extended to find clones of a victim on different platforms and more attributes can be considered for clustering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.