Abstract-Cyberbullying is the use of technology as a medium to bully someone. Although it has been an issue for many years, the recognition of its impact on young people has recently increased. Social networking sites provide a fertile medium for bullies, and teens and young adults who use these sites are vulnerable to attacks. Through machine learning, we can detect language patterns used by bullies and their victims, and develop rules to automatically detect cyberbullying content.The data we used for our project was collected from the website Formspring.me, a question-and-answer formatted website that contains a high percentage of bullying content. The data was labeled using a web service, Amazon's Mechanical Turk. We used the labeled data, in conjunction with machine learning techniques provided by the Weka tool kit, to train a computer to recognize bullying content. Both a C4.5 decision tree learner and an instance-based learner were able to identify the true positives with 78.5% accuracy.
OverviewIn this chapter we describe several systems that detect emerging trends in textual data. Some of the systems are semiautomatic, requiring user input to begin processing, and others are fully automatic, producing output from the input corpus without guidance. For each Emerging Trend Detection (ETD) system we describe components including linguistic and statistical features, learning algorithms, training and test set generation, visualization, and evaluation. We also provide a brief overview of several commercial products with capabilities of detecting trends in textual data, followed by an industrial viewpoint describing the importance of trend detection tools, and an overview of how such tools are used.This review of the literature indicates that much progress has been made toward automating the process of detecting emerging trends, but there is room for improvement. All of the projects we reviewed rely on a human domain expert to separate the emerging trends from noise in the system. Furthermore, we discovered that few projects have used formal evaluation methodologies to determine the effectiveness of the systems being created. Development and use of effective metrics for evaluation of ETD systems is critical.Work continues on the semiautomatic and fully automatic systems we are developing at Lehigh University [HDD]. In addition to adding formal evaluation components to our systems, we are also researching methods for automatically developing training sets and for merging machine learning and visualization to develop more effective ETD applications. M. W. Berry (ed.), Survey of Text Mining
Cyberbullying is a relatively new phenomenon associated with the widespread adoption of various digital communication technologies, including the internet and mobile phones. As of 2013, nearly 20% of youths in grades 9-12 in the US reported being traditionally bullied in face-to-face encounters while almost 15% reported being cyberbullied (Kann et al., 2014). Bullying victimization is associated with a variety of behavioral and psychological effects, from becoming bullies themselves (i.e., bully-victims), to poor academic performance, depression and suicidal ideation (Nansel et al., 2001;Wang, Nansel, & Iannotti, 2011;Willard, 2007). Research on these phenomena has focused primarily on white youth, leaving a void in our understanding of how cyberbullying has affected youth of color. This narrative literature review addresses this oversight by providing an overview of recent cyberbullying research that focuses on Hispanic, Asian and black adolescents (k=15). We found that youth of color appear to be less likely to experience cyberbullying than white youth but they experience suicidal ideation and attempts at about the same rates when they do experience cyberbullying.
Latent Semantic Indexing (LSI) is commonly used to match queries to documents in information retrieval information. We then test this model by developing a modified version of LSI that captures this information, Essential Dimensions of LSI (EDLSI). EDLSI significantly improves retrieval performance on corpora that previously did not benefit from LSI, and offers improved runtime performance when compared with traditional LSI.Traditional LSI requires the use of a dimensionality reduction parameter which must be tuned for each collection. Applying our model, we have also shown that a small, fixed dimensionality reduction parameter (k=10) can be used to capture the term relationship information in a corpus.
In this chapter we analyze the values used by Latent Sematic Indexing (LSI) for information retrieval. By manipulating the values in the Singular Value Decomposition (SVD) matrices, we find that a significant fraction of the values have little effect on overall performance, and can thus be removed (changed to zero). This allows us to convert the dense term by dimension and document by dimension matrices into sparse matrices by identifying and removing those entries. We empirically show that these entries are unimportant by presenting retrieval and runtime performance results, using seven collections, which show that removal of up 70% of the values in the term by dimension matrix results in similar or improved retrieval performance (as compared to LSI). Removal of 90% of the values degrades retrieval performance slightly for smaller collections, but improves retrieval performance by 60% on the large collection we tested. Our approach additionally has the computational benefit of reducing memory requirements and query response time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.