Rajesh Krishna Balan

Scale and Performance in a Distributed File System 
John H. Howard, Michael L. Kazar et. al.  

This paper talks about the early development of the Andrew File System
(AFS). The paper details how the designers of the system benchmarked and
evaluated the performance of the early prototype version of AFS. Based on
the results of their benchmarks, the authors made changes to AFS and this is
detailed in the paper. The goal of the paper was to find out the bottlenecks
in AFS that prevented it from scaling to a large number of users without
imposing significant load on the AFS servers. In this sense, the authors
have done a fantastic job of finding and then highlighting the various parts
of AFS that were preventing it from scaling.

The main points i took away from this paper are 

1) When analyzing a system, you are only as good as your benchmark!! Without
a good benchmark, you will not be able to find the deficiencies in your
system. Or worse, you might find things that are not actual problems under a
more realistic benchmark. The AFS people developed their own benchmark that
stressed their servers considerably. This benchmark was so useful that it
became the norm for future filesystem research. 

2) A good design is great but in practice, there will be a number of
unexpected control/data paths that need to be heavily optimized to get good
performance from the system. In the AFS case, it was surprising to see that
most of the time was spent doing stat() calls and this resulted in the
designers electing to also cache metadata at the clients. The designers also
realized that the servers were doing a lot of expensive namei() calls to
retrieve inodes given pathnames. As such, they decided to have the clients
send the servers information which could be used to avoid namei() calls (the
FIDs) and not pathnames themselves. Finally, to eliminate the large number
of TestAuth() calls made by the server to the client, the authors decided to
provide a callback mechanism by which the responsibility to inform the
client if a file changed was given to the server. As such, the client did
not need to check the validity of a file every time it accessed it from its
cache. 

3) Finding those unexpected control/data paths requires extensive testing
and a good benchmark/test suit. This relates back to point 1) again. And it
is crucial to find those bottlenecks as they can severely limit the
performance of the system. And since they are unexpected, it is usually not
possible to predict what they would be in advance or by looking at the
design really carefully. 

4) Finally, there is the tradeoff between performance/scalability and
absolute consistency and compatibility with traditional file system
semantics. The authors had to make design choices whereby they elected to
provide weaker filesystem consistency (by allowing callbacks etc.) and to
use file system semantics that were not completely compatible with
traditional BSD semantics (changes to open files were not visible to other
machines in the network etc.) in order to improve the performance of AFS.
This is the classic issue of trading off one goal for another in systems
building.

Overall this was a very nice paper to read but it did highlight some
shortcomings of the system. 

1) Due to the semantics used by the authors, the system would be totally
unusable by databases as databases require absolute semantics which AFS
cannot support. 

2) The servers still have to keep a lot of state for each client (which gets
worse if a client registers a lot of callbacks with the server). As
such, when the filesystem gets larger and each client starts accessing more
files, it is unclear whether the servers will be able to keep up at the same
rate at which the usage patterns of the clients are growing.