Improving the Deduplication Performance in ZFS Vinaykumar Bhat and Durga Kamuju Carnegie Mellon University ABSTRACT Internet scale services have massive storage requirements. Most services use Virtual Machines for deployment and these VMs are backed up in their entirety for fault tolerance and recovery. Periodic backing up of VMs incurs huge back up storage costs. Most of the blocks in the VM will not be modified over the backup duration — this adds substantial amount of duplicate blocks in the backup system. Deduplication is a widely used technique used to reduce the disk space consumption and is widely being used to reduce disk storage costs. Deduplication is extremely well suited for workloads which have a lot of identical blocks, for example periodic backups. ZFS is an enterprise grade file system initially developed by Sun Microsystems, now available as several open source implementations [4, 9]. ZFS supports block level in-line deduplication. However this comes with substantial memory and performance costs. In this paper we explore a few performance optimizations which reduce the cost associated with ZFS deduplication by using some well known techniques such as Bloom filters and an efficient in-memory hash table. Our evaluation shows how the performance optimizations we have adopted increase the efficiency of the write path in ZFS deduplication.