2017
DOI: 10.17706/jcp.12.4.362-370
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Cross User Client Side Data Deduplication in Hadoop

Abstract: Hadoop is widely used for applications like Aadhaar card, Healthcare, Media, Ad Platform, Fraud Detection & Crime, and Education etc. However, it does not provide efficient and optimized data storage solution. One interesting thing we found that when user uploads the same file twice with same file name it doesn't allow saving the same file. But when user uploads the same file content with different file name Hadoop allows uploading that file. In general same files are uploaded by many users (cross user) with d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(1 citation statement)
references
References 11 publications
(9 reference statements)
0
1
0
Order By: Relevance
“…This is a waste of storage space and reduced processing efficiency on devices. This paper [47] proposes DeDup approach to eliminating duplication data by calculating the hash value at the file level before uploading it to HDFS and comparing it with existing files. When download the file by any user, check to Hbase through the hash account to see if the files download the same content or not.…”
Section: Data De-duplication Security Issuesmentioning
confidence: 99%
“…This is a waste of storage space and reduced processing efficiency on devices. This paper [47] proposes DeDup approach to eliminating duplication data by calculating the hash value at the file level before uploading it to HDFS and comparing it with existing files. When download the file by any user, check to Hbase through the hash account to see if the files download the same content or not.…”
Section: Data De-duplication Security Issuesmentioning
confidence: 99%