2019 34th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC) 2019
DOI: 10.1109/itc-cscc.2019.8793419
|View full text |Cite
|
Sign up to set email alerts
|

The Document Similarity Index based on the Jaccard Distance for Mail Filtering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
1
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(6 citation statements)
references
References 0 publications
0
5
0
1
Order By: Relevance
“…For that purpose, we insert a similarity detector based on the Jaccard distance [55] before the classifier. This is inspired by spam detection techniques that use this distance to seek for characteristic sentences [56]- [58].…”
Section: ) Training Sample Reduction With Similarity Detection Stagementioning
confidence: 99%
“…For that purpose, we insert a similarity detector based on the Jaccard distance [55] before the classifier. This is inspired by spam detection techniques that use this distance to seek for characteristic sentences [56]- [58].…”
Section: ) Training Sample Reduction With Similarity Detection Stagementioning
confidence: 99%
“…Regarding approximate word matching in item (c) above, various well‐known algorithms based on different word features may be adopted. For item (c)(i), in determining word similarity based on spelling , our PASS‐ToP implementation chose to use the Jaccard Similarity (coefficient), developed by Paul Jaccard [18], as it is one of the simplest and widely used similarity algorithms in data science and text processing applications, such as mail filtering [41]. For item (c)(ii), in determining word similarity based on pronunciation , our PASS‐ToP implementation chose to use the Soundex phonetic algorithm [25].…”
Section: From Pass To Pass‐top: Towards a Versatile Test Oraclementioning
confidence: 99%
“…The data used social media twitter hoax against the corona virus, politics and the environment with the hoax label as 49.4% and 50.6% non-hoax. Implementation of the Jaccard Index has been conducted to determine the similarity index in the classification of ham and spam e-mails [15]. The method used is the Document Similarity Index (DSI) which is calculated from the Jaccard Index and gets 98% precision results for Ham labels and 98% for Spam labels.…”
Section: Previous Workmentioning
confidence: 99%
“…Mail Spam Jaccard Similarity [15] The result of this resarch is 98% precision for ham and spam labels.…”
Section: Decisionmentioning
confidence: 99%