Data deduplication is a storage saving technique which is a key component of enterprise storage environment. When data deduplication techniques emerged, focus was on reducing space requirement for secondary storage and increase performance of backup techniques. With help of cluster technology, it is possible to improve throughput and capacity limitations of single node solution. It is possible to combine exact data deduplication with small chunk sizes and scalability in one environment. In this system, shared storage is foundation for load balancing and fault tolerance. But it has the limitation of communication overhead. This can be overcome using a clustered Dedupe System. This System is used to overcome intra node and inter node deduplication overheads. Similarity Duplicate detection in Dedupe System is done using Jaccard Index; its performance can be further improved using Simhash. By making use of simhash and chunk locality, high deduplication throughput with low system overheads can be achieved.