April Kontostathis scite author profile

Abstract-Cyberbullying is the use of technology as a medium to bully someone. Although it has been an issue for many years, the recognition of its impact on young people has recently increased. Social networking sites provide a fertile medium for bullies, and teens and young adults who use these sites are vulnerable to attacks. Through machine learning, we can detect language patterns used by bullies and their victims, and develop rules to automatically detect cyberbullying content.The data we used for our project was collected from the website Formspring.me, a question-and-answer formatted website that contains a high percentage of bullying content. The data was labeled using a web service, Amazon's Mechanical Turk. We used the labeled data, in conjunction with machine learning techniques provided by the Weka tool kit, to train a computer to recognize bullying content. Both a C4.5 decision tree learner and an instance-based learner were able to identify the true positives with 78.5% accuracy.

show abstract

A Survey of Emerging Trend Detection in Textual Data Mining

Kontostathis¹,

Galitsky²,

Pottenger³

et al. 2004

119

View full text Add to dashboard Cite

OverviewIn this chapter we describe several systems that detect emerging trends in textual data. Some of the systems are semiautomatic, requiring user input to begin processing, and others are fully automatic, producing output from the input corpus without guidance. For each Emerging Trend Detection (ETD) system we describe components including linguistic and statistical features, learning algorithms, training and test set generation, visualization, and evaluation. We also provide a brief overview of several commercial products with capabilities of detecting trends in textual data, followed by an industrial viewpoint describing the importance of trend detection tools, and an overview of how such tools are used.This review of the literature indicates that much progress has been made toward automating the process of detecting emerging trends, but there is room for improvement. All of the projects we reviewed rely on a human domain expert to separate the emerging trends from noise in the system. Furthermore, we discovered that few projects have used formal evaluation methodologies to determine the effectiveness of the systems being created. Development and use of effective metrics for evaluation of ETD systems is critical.Work continues on the semiautomatic and fully automatic systems we are developing at Lehigh University [HDD]. In addition to adding formal evaluation components to our systems, we are also researching methods for automatically developing training sets and for merging machine learning and visualization to develop more effective ETD applications. M. W. Berry (ed.), Survey of Text Mining

show abstract

A framework for understanding Latent Semantic Indexing (LSI) performance

Kontostathis

Pottenger

2006

Information Processing & Management

145

View full text Add to dashboard Cite

Learning to Identify Internet Sexual Predation

Mcghee

Bayzick

Kontostathis

et al. 2011

International Journal of Electronic Commerce

View full text Add to dashboard Cite

Cyberbullying, Race/Ethnicity and Mental Health Outcomes: A Review of the Literature

Edwards

Kontostathis

Fisher

2016

MaC

View full text Add to dashboard Cite

Cyberbullying is a relatively new phenomenon associated with the widespread adoption of various digital communication technologies, including the internet and mobile phones. As of 2013, nearly 20% of youths in grades 9-12 in the US reported being traditionally bullied in face-to-face encounters while almost 15% reported being cyberbullied (Kann et al., 2014). Bullying victimization is associated with a variety of behavioral and psychological effects, from becoming bullies themselves (i.e., bully-victims), to poor academic performance, depression and suicidal ideation (Nansel et al., 2001;Wang, Nansel, & Iannotti, 2011;Willard, 2007). Research on these phenomena has focused primarily on white youth, leaving a void in our understanding of how cyberbullying has affected youth of color. This narrative literature review addresses this oversight by providing an overview of recent cyberbullying research that focuses on Hispanic, Asian and black adolescents (k=15). We found that youth of color appear to be less likely to experience cyberbullying than white youth but they experience suicidal ideation and attempts at about the same rates when they do experience cyberbullying.

show abstract

Text Mining and Cybercrime

Kontostathis

Edwards

Leatherman

2010

View full text Add to dashboard Cite

Essential Dimensions of Latent Semantic Indexing (LSI)

Kontostathis

2007

View full text Add to dashboard Cite

Latent Semantic Indexing (LSI) is commonly used to match queries to documents in information retrieval information. We then test this model by developing a modified version of LSI that captures this information, Essential Dimensions of LSI (EDLSI). EDLSI significantly improves retrieval performance on corpora that previously did not benefit from LSI, and offers improved runtime performance when compared with traditional LSI.Traditional LSI requires the use of a dimensionality reduction parameter which must be tuned for each collection. Applying our model, we have also shown that a small, fixed dimensionality reduction parameter (k=10) can be used to capture the term relationship information in a corpus.

show abstract

Identification of Critical Values in Latent Semantic Indexing

Kontostathis

Pottenger

Davison

2005

View full text Add to dashboard Cite

In this chapter we analyze the values used by Latent Sematic Indexing (LSI) for information retrieval. By manipulating the values in the Singular Value Decomposition (SVD) matrices, we find that a significant fraction of the values have little effect on overall performance, and can thus be removed (changed to zero). This allows us to convert the dense term by dimension and document by dimension matrices into sparse matrices by identifying and removing those entries. We empirically show that these entries are unimportant by presenting retrieval and runtime performance results, using seven collections, which show that removal of up 70% of the values in the term by dimension matrix results in similar or improved retrieval performance (as compared to LSI). Removal of 90% of the values degrades retrieval performance slightly for smaller collections, but improves retrieval performance by 60% on the large collection we tested. Our approach additionally has the computational benefit of reducing memory requirements and query response time.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.