Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems 2014
DOI: 10.1145/2663761.2664221
|View full text |Cite
|
Sign up to set email alerts
|

Application of sim-hash algorithm and big data analysis in spam email detection system

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(2 citation statements)
references
References 6 publications
0
2
0
Order By: Relevance
“…Simhashing has wide-ranging applications from detecting duplicates in texts (e.g. websites) to different security and to malware analysis, specifically with the Hamming distance similarity measure [25,26,27]. Inspired from NLP application domain, a n-gram is a contiguous sequence of n items (here, a byte pair) from a given sequence of the binary file.…”
Section: The Feature Extractionmentioning
confidence: 99%
“…Simhashing has wide-ranging applications from detecting duplicates in texts (e.g. websites) to different security and to malware analysis, specifically with the Hamming distance similarity measure [25,26,27]. Inspired from NLP application domain, a n-gram is a contiguous sequence of n items (here, a byte pair) from a given sequence of the binary file.…”
Section: The Feature Extractionmentioning
confidence: 99%
“…While min-hash uses many hash values to represent a document, having each value computed with a different hash function, simhash gives a more compact output by reducing document vectors to a small sized real-valued fingerprints (Charikar, 2002). Simhash was successfully evaluated for duplicate detection of web pages (Manku et al, 2007;Henzinger, 2006), code segments (Uddin et al, 2011), short messages (Pi et al, 2009), spam (Ho et al, 2014) and academic papers (Williams and Giles, 2013). Our contribution to the literature is in the use of simhash fingerprinting for larger texts in form of digital books.…”
Section: Related Workmentioning
confidence: 99%