In the era of big data and artificial intelligence, online risk detection has become a popular research topic. From detecting online harassment to the sexual predation of youth, the state-of-the-art in computational risk detection has the potential to protect particularly vulnerable populations from online victimization. Yet, this is a high-risk, high-reward endeavor that requires a systematic and human-centered approach to synthesize disparate bodies of research across different application domains, so that we can identify best practices, potential gaps, and set a strategic research agenda for leveraging these approaches in a way that betters society. Therefore, we conducted a comprehensive literature review to analyze 73 peer-reviewed articles on computational approaches utilizing text or meta-data/multimedia for online sexual risk detection. We identified sexual grooming (75%), sex trafficking (12%), and sexual harassment and/or abuse (12%) as the three types of sexual risk detection present in the extant literature. Furthermore, we found that the majority (93%) of this work has focused on identifying sexual predators after-the-fact, rather than taking more nuanced approaches to identify potential victims and problematic patterns that could be used to prevent victimization before it occurs. Many studies rely on public datasets (82%) and third-party annotators (33%) to establish ground truth and train their algorithms. Finally, the majority of this work (78%) mostly focused on algorithmic performance evaluation of their model and rarely (4%) evaluate these systems with real users. Thus, we urge computational risk detection researchers to integrate more human-centered approaches to both developing and evaluating sexual risk detection algorithms to ensure the broader societal impacts of this important work.
Sexual exploration is a natural part of adolescent development; yet, unmediated internet access has enabled teens to engage in a wider variety of potentially riskier sexual interactions than previous generations, from normatively appropriate sexual interactions to sexually abusive situations. Teens have turned to online peer support platforms to disclose and seek support about these experiences. Therefore, we analyzed posts (N=45,955) made by adolescents (ages 13--17) on an online peer support platform to deeply examine their online sexual risk experiences. By applying a mixed methods approach, we 1) accurately (average of AUC = 0.90) identified posts that contained teen disclosures about online sexual risk experiences and classified the posts based on level of consent (i.e., consensual, non-consensual, sexual abuse) and relationship type (i.e., stranger, dating/friend, family) between the teen and the person in which they shared the sexual experience, 2) detected statistically significant differences in the proportions of posts based on these dimensions, and 3) further unpacked the nuance in how these online sexual risk experiences were typically characterized in the posts. Teens were significantly more likely to engage in consensual sexting with friends/dating partners; unwanted solicitations were more likely from strangers and sexual abuse was more likely when a family member was involved. We contribute to the HCI and CSCW literature around youth online sexual risk experiences by moving beyond the false dichotomy of "safe" versus "risky". Our work provides a deeper understanding of technology-mediated adolescent sexual behaviors from the perspectives of sexual well-being, risk detection, and the prevention of online sexual violence toward youth.
We collected Instagram data from 150 adolescents (ages 13-21) that included 15,547 private message conversations of which 326 conversations were flagged as sexually risky by participants. Based on this data, we leveraged a human-centered machine learning approach to create sexual risk detection classifiers for youth social media conversations. Our Convolutional Neural Network (CNN) and Random Forest models outperformed in identifying sexual risks at the conversation-level (AUC=0.88), and CNN outperformed at the message-level (AUC=0.85). We also trained classifiers to detect the severity risk level (i.e., safe, low, medium-high) of a given message with CNN outperforming other models (AUC=0.88). A feature analysis yielded deeper insights into patterns found within sexually safe versus unsafe conversations. We found that contextual features (e.g., age, gender, and relationship type) and Linguistic Inquiry and Word Count (LIWC) contributed the most for accurately detecting sexual conversations that made youth feel uncomfortable or unsafe. Our analysis provides insights into the important factors and contextual features that enhance automated detection of sexual risks within youths' private conversations. As such, we make valuable contributions to the computational risk detection and adolescent online safety literature through our human-centered approach of collecting and ground truth coding private social media conversations of youth for the purpose of risk classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.