Towards a Linguistic Stylometric Model for the Authorship Detection in Cybercrime Investigations

Omar, Abdulfattah; Deraan, Aldawsari Bader

doi:10.5539/ijel.v9n5p182

Cited by 4 publications

(4 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The approach is widely used due to its conceptual simplicity and ease of determining semantic similarity within documents (Zhiguo, Luo, Chen, Wang & Lei, 2011). One problem with VSC, however, is that it cannot deal with short documents effectively due to sparsity (Amensisa, Patil & Agrawal, 2018;Moisl & Maguire, 2008;Omar & Aldawsari, 2019). Given the nature of the lyrics in this study, the GSDMM technique developed by Yin and Wang (Yin & Wang, 2014) was selected.…”

Section: Methodsmentioning

confidence: 99%

Authorship attribution of Morsi Gameel Aziz’s lyrics: A clustering-based stylometry approach

Omar

2021

JLLS

Self Cite

View full text Add to dashboard Cite

Numerous studies have addressed the issue of the authorship of Morsi Gameel Aziz's lyrics. These studies have traditionally been based on chronological criteria for determining the real authors of disputed lyrics. To date, there is no agreement on the real authors of these disputed lyrics. This can mainly be attributed to both selectivity and the lack of empirical evidence in such studies, raising questions about the reliability of such approaches. With the advent of machine learning systems and data mining techniques, it is now possible to process thousands of texts using replicable methods. Thus, this study seeks to address the issue of the authorship of Morsi Gameel Aziz's lyrics making use of these advances by applying a clustering-based stylometry approach. The hypothesis is that lyrics grouped or clustered together are more likely to be written by the same poet. A corpus of 1,089 lyrics was built, including all known lyrics attributed to Aziz and the lyrics of the poets thought to be the real authors of the disputed lyrics. The lyrics were clustered using the Gibbs sampling Dirichlet multinomial mixture (GSDMM) technique and were assigned to four main classes, with the 12 disputed lyrics clustered within Aziz's class. Based on this, it is clear that the GSDMM model is effective and reliable in clustering short documents in Arabic. The results of the study show that machine learning systems and stylometric authorship techniques can be used in resolving many authorship questions that remain controversial and unanswered in Arabic literature.

show abstract

Section: Methodsmentioning

confidence: 99%

Authorship attribution of Morsi Gameel Aziz’s lyrics: A clustering-based stylometry approach

Omar

2021

JLLS

Self Cite

View full text Add to dashboard Cite

show abstract

“…For example, Ishihara (2017) demonstrated how forensic text comparison could be used on chat conversations of various lengths from 500 to 2500 tokens. Omar and Deraan (2019) found that the inclusion of different variables into an integrated system leads to improved Authorship Attribution performance on short texts. A combination of analysing lexical features and letter-pair frequencies resulted in an accuracy of 76 %.…”

Section: The Linguistics Of Groomingmentioning

confidence: 99%

Conceptualizing an AI-based Police Robot for Preventing Online Child Sexual Exploitation and Abuse:

Sunde¹,

Sunde²

2022

NJSP

View full text Add to dashboard Cite

Child sexual exploitation and abuse (CSEA) needs more attention from a crime prevention perspective. This is the first in a two-part series about the PrevBOT concept, which is an automated tool supporting the police in preventing CSEA in online chat rooms. Part I presents the concept, its theoretical framework, and the technology. Equipped with technology for Authorship Analysis, the tool can identify problematic digital spaces unsafe for children. Given the advancements in machine learning algorithms, PrevBOT may provide predictions concerning age and gender behind online aliases engaged in sexualized speech with children and assist the police in identifying former CSEA offenders who resume the criminal activity online. Part II provides a legal analysis of issues relating to data protection, privacy, and fair trial.

show abstract

“…Egy megfelelően felépített anonim profil esetében további, különleges szakismeretekre is szükség lehet annak érdekében, hogy azt egy konkrét személyhez lehessen kötni. Jó példa lehet erre a felhasználó nyelvhasználati szokásainak elemzése, mely a társadalmi, közösségi hovatartozására, származására, képzettségére vonatkozó információkkal is szolgálhat (Omar, 2019).…”

Section: Nyílt Forrású Adatgyűjtés -A Megoldás?unclassified

Challenges of cybercrime investigations

Herédi¹

2022

BSZ

View full text Add to dashboard Cite

A bűnüldöző szervek jelenkori kihívásai közül kiemelkedik a kiberbűncselekmények eredményes felderítésének biztosítása. Általánosságban elmondható, hogy ezt a bűncselekményi kategóriát a nagyfokú látencia mellett olyan felderítési nehézségek is jellemzik, amelyek az eljárás ügymenetét jelentősen lassíthatják, vagy akár lehetetlenné tehetik annak sikeres befejezését. Az internet lakossági penetrációjának növekedésével újabb és újabb felületek és szolgáltatások jelennek meg az online térben, amely egyben lehetőséget teremt újfajta elkövetési magatartások megjelenésének is. A kibertér által biztosított anonimitás ösztönző jelleggel hat az elkövetők számára, hiszen a kontaktelkövetésekhez képest jóval kisebb a lebukás esélye, illetve vagyont érintő bűncselekmények esetében jóval nagyobb az okozott kár is. A kiberbűncselekmények különböző kategóriáinak felderítési nehézségei azonosak, legyen szó akár egy egyszerűbb, információs rendszer felhasználásával elkövetett bűncselekmény, akár egy szofisztikált módon kivitelezett kibertámadást érintő eljárásról. Ezek a nehézségek a legtöbb esetben az online identitást elfedő szolgáltatások használatához, a technikai újdonságokhoz, illetve a nemzetközi hatósági együttműködésből fakadó hiányosságokhoz kapcsolódnak. A megoldási lehetőségek a közvetlen nemzetközi együttműködési formák – különös tekintettel a 24/7-es kapcsolati hálózatok – alkalmazásában, a nyílt forrású adatgyűjtés és az online profilalkotás aktív használatában, illetve a rendvédelmi szervek állományának rendszeres továbbképzésében rejlenek. Ha az adott szakkérdés még ezek mellett is meghaladja a felderítő szerv állományának ismereteit, úgy célszerű lehet különleges szakismerettel rendelkező szaktanácsadókat, illetve szakértőket is bevonni az eljárásba.

show abstract

Towards a Linguistic Stylometric Model for the Authorship Detection in Cybercrime Investigations

Cited by 4 publications

References 19 publications

Authorship attribution of Morsi Gameel Aziz’s lyrics: A clustering-based stylometry approach

Authorship attribution of Morsi Gameel Aziz’s lyrics: A clustering-based stylometry approach

Conceptualizing an AI-based Police Robot for Preventing Online Child Sexual Exploitation and Abuse:

Challenges of cybercrime investigations

Contact Info

Product

Resources

About