Rajesh Krishna Balan


File System Design for an NFS File Server Appliance 
Dave Hitz, James Lau, Michael Malcolm

This paper talks about the development of a filesystem that is optimized for
NFS and RAID.

Their scheme involves using NVRAM to buffer outstanding NFS requests at the
server. The server will checkpoint the files on the disk (every 10s) and
only when a checkpoint is done will the requests in NVRAM be merged into the
disk. 

I.e., the files on the disk are always consistent with the last checkpoint
and new requests are always stored in NVRAM until the next checkpoint cycle.
This allows the server to recover really fast after a power failure as the
disk image is always consistent and only the outstanding requests in NVRAM
need to be merged into the disk. 

To speed up the checkpointing, the authors propose a scheme to clear the
contents of the NVRAM as soon as possible. It does this by first allocating
blocks on the disk for the requests in the NVRAM and then writing the
contents of the requests to the disk. This allows them to handle incoming NFS
requests while checkpointing the disk. 

The system also provides an easy way (by just creating new top level inodes)
of creating multiple snapshots of the disk. These can be used by end-users
to recover files that had been accidently deleted etc. These snapshots are
visible to the end-user unlike the checkpointing described above which is
only visible to the system (and used by it to ensure filesystem
consistency). 

The main idea i got from this paper was that you should design your systems
explicitly for what you need the system to do. I.e. don't build a general
purpose system for everything.  It may be better to build a system that is
optimized to do what you want done rather than a system which can do
everything but slower. This is what the authors have done. Their system only
works for NFS and requires NVRAM. But it works very well for NFS and RAID
disks. 

The problems i have with the paper is that the authors don't really talk
about the overhead of their system. I.e., how much space/time is needed to
store all the extra information for the snapshots and the extra information
attached to each datablock to keep track of which snapshot it is in. Also,
it is not clear how much time is needed to remove snapshots and recover
datablocks that are only accessed by that snapshot. Finally, the whole
section on performance analysis leaves much to be desired. It may be true
that their system is very unique and thus hard to compare with other
systems. But it is inherently a filesystem and there are metrics to compare
filesystems.