Rajesh Krishna Balan

The Design and Implementation of a Log-Structures File System 
Mendel Rosenblum, John K. Ousterhout  

This paper talks about a new filesystem known as the log-structured file
system (LFS). This system was designed to speed up writes to disks and was
based
on the observation that with increasing memory caches, most of the accesses
to disks will be writes and not reads. 

The filesystem buffers a sequence of disk writes and writes them to the disk
as one long sequential write. This eliminate the seek time needed to find
the place on the disk to write the files common in other filesystems.  

The authors specify a scheme whereby indexing entries are stored in the log
written to the disk so that files can be accessed later without having to
sequentially search the entire log for the file. 

To work efficiently, the LFS requires large amounts of free
consecutive disk space in order to write the log. If the disk space was
fragmented, the filesystem would have to write the log all over the disk and
this would negate the benefits of the filesystem as the writes would need
multiple seeks to find the next free segment on the disk. The authors propse
a cleaning mechanism that runs in the background and automatically reclaims
disk space from segments that are no longer in use. The cleaning mechanism
also moves segments around to form larger consecutive blocks of free
space. This is somewhat similar to garbage collection algorithms in modern
programming languages. 

Finally, the authors do a performance analysis of a system employing the LFS
and show that its performance is good compared with regular UNIX filesystems
like BSD's FFS and Solaris's filesystem. 

The main thing i took away from this paper is that there are always
tradeoffs in systems design. LFS improves writes to disk at the cost of
having to run the expensive cleaning process. 

The major problem with this paper is the overhead associated with the
cleaning mechanisms. These mechanisms need to be run in the background and
add additional load to the system. It's unclear how these mechanisms will
perform in systems where the disk and/or CPU load is very high. Also the
cleaning mechanisms need to identify the file access patterns so that it can
optimize its cleaning methodology. This does not seem that easy to figure
out.