International Conference on Semantic Computing (ICSC 2007) 2007
DOI: 10.1109/icsc.2007.32
|View full text |Cite
|
Sign up to set email alerts
|

Toward Spotting the Pedophile Telling victim from predator in text chats

Abstract: This paper presents the results of a pilot study on using automatic text categorization techniques in identifying online sexual predators. We report on our SVM and k-NN models. Our distance weighted k-NN classifier reaches an f-measure of 0.943 on test data distinguishing the child and the victim sides of text chats between sexual predators and volunteers posing as underage victims.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
29
0

Year Published

2010
2010
2016
2016

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 53 publications
(29 citation statements)
references
References 7 publications
0
29
0
Order By: Relevance
“…As already mentioned, while previous studies were focused on classifying chat lines into different categories (McGhee et al, 2011) or distinguishing between offender and victim (Pendar, 2007), in this work we address the problem of revealing which high-level features are discriminative when distinguishing pedophiles' chats from non-pedophiles' ones.…”
Section: Our Approachmentioning
confidence: 99%
See 2 more Smart Citations
“…As already mentioned, while previous studies were focused on classifying chat lines into different categories (McGhee et al, 2011) or distinguishing between offender and victim (Pendar, 2007), in this work we address the problem of revealing which high-level features are discriminative when distinguishing pedophiles' chats from non-pedophiles' ones.…”
Section: Our Approachmentioning
confidence: 99%
“…In contrast, this research is not about identifying users convincing others to provide some sexual favour. It neither aims at classification of chat lines into categories, as it was done by McGhee et al (2011), nor at discriminating between victim and pedophile as it was done by Pendar (2007). Our goal is to reveal semantic dimensions, i.e.…”
Section: Related Researchmentioning
confidence: 99%
See 1 more Smart Citation
“…They combine text categorisation, category information provided by LIWC (Linguistic Inquiry and Word Count) and a Naïve Bayes classifier [32,33]. Unlike Pendar [30]), they do not use pre-processing or spellchecking. McGhee et al [34] use a rule based approach and k-NN -achieving 83% accuracy -for Group C, labelling conversations -partially similar to Olson et al [11] as (i) Exchange of personal information; (ii) Grooming; (iii) Approach; (iv) None of the above.…”
Section: State-of-the-artmentioning
confidence: 99%
“…Group A: Distinguishing between predators and victims/ children Group B: Identifying inappropriate chatconversations with victims/ children Group C: Identifying the grooming/ predator Pendar [30] addresses Group A by removing stopwords, generating word unigrams, bigrams and trigrams and using a Support Vector Machine (SVM) and k-NN to achieve an f-of 0.943. RahmanMiah et al [31] address Group B through three classes of chat: (i) Child Exploitation: adult-child sexual conversation; (ii) Sexual Fantasies: adult-adult sexually explicit conversation; (iii) General: conversations with no sexual content.…”
Section: State-of-the-artmentioning
confidence: 99%