A Clustered Inline Deduplication For Secondary Data

!!!! Bi-Annual Double Blind Peer Reviewed Refereed Journal !!!!

!!!! Open Access Journal !!!!

Category: 
Part1
Author: 
Neha Chetan Amale, Department of Information Technology MIT College of Engineering, Pune, India
Jyoti Malhotra, Department of Information Technology MIT College of Engineering, Pune, India
Abstract: 

Data deduplication is a storage saving technique which is a key component of enterprise storage environment. When data deduplication techniques emerged, focus was on reducing space requirement for secondary storage and increase performance of backup techniques. With help of cluster technology, it is possible to improve throughput and capacity limitations of single node solution. It is possible to combine exact data deduplication with small chunk sizes and scalability in one environment. In this system, shared storage is foundation for load balancing and fault tolerance. But it has the limitation of communication overhead. This can be overcome using a clustered Dedupe System. This System is used to overcome intra node and inter node deduplication overheads. Similarity Duplicate detection in Dedupe System is done using Jaccard Index; its performance can be further improved using Simhash. By making use of simhash and chunk locality, high deduplication throughput with low system overheads can be achieved.

Rating: 
No votes yet