18-845: Internet Services
Carnegie Mellon University, Spring 2015

Syllabus (pdf) | Critiques | Individual Project (IP) | Group Project (GP)

1. Instructors

Prof. David O'Hallaron, droh@cs.cmu.edu, GHC 7517, (412) 268-8199,
Office hours: Wed 4-5pm (or by appt.)

TA: Lou Clark, johnclar@andrew.cmu.edu,
Office hours: Fri 12:30-1:30pm (or by appt.), GHC 5202 (Collaborative Commons)

2. Organization

Class times: Mon and Wed, 2:30-3:50, DH 2105
Web page: www.ece.cmu.edu/~ece845
Class mailing list: 18-845@cs.cmu.edu
Staff mailing list: 18-845-staff@cs.cmu.edu
Blackboard: We will not be using Blackboard.
Course directory: /afs/ece/class/ece845

3. Reference material

There is no required textbook for 18-845. The following are standard references for Linux programming and network programming:
  • Michael Kerrisk, The Linux Programming Interface: A Linux and UNIX System Programming Handbook, No Starch Press, 2010.
  • W. Richard Stevens, Bill Fenner, Andrew M. Rudoff Unix Network Programming: The Sockets Networking API, Volume 1 (3rd Edition), Prentice Hall, 2003.
The CS:APP2e text, which is available in the campus bookstore and on permanent reserve in the Engineering library, covers system-level programming topics such as dynamic linking, process control, Unix I/O, the sockets interface, writing Web servers, and application level concurrency and synchronization:

4. Cluster resources

  • Andrew cluster: linux.andrew.cmu.edu
    • RHEL 6x, 64-bit, login using your Andrew credentials
  • Gates cluster: ghc{01..81}.ghc.andrew.cmu.edu
    • RHEL 6x, 64-bit, login using your Andrew credentials
  • ECE cluster: ece{000-031}.ece.cmu.edu
    • SuSE 12x, 64 bit, login using your ECE credentials
    • See here for details. Contact help@ece.cmu.edu for help with accounts.
  • Other ECE clusters: "games" and "dead mathematician" clusters
    • See here for details. Contact help@ece.cmu.edu for help with accounts.

5. Course schedule

Legend: IP: individual project, GP: group project

Class Date Day Topic Projects Discussion Leader
1 01/12 Mon Intro and welcome Dave O'Hallaron
2 01/14 Wed System design principles IP out Dave O'Hallaron
3 01/19 Mon No class - MLK day ---
4 01/21 Wed Server design: Basics Dave O'Hallaron
5 01/26 Mon Server design: Events vs. threads Dave O'Hallaron
6 01/28 Wed Comparing server performance Lou Clark
7 02/02 Mon Measuring server capacity Lou Clark
8 02/04 Wed Clustering: Request routing Jiayu Liu
9 02/9 Mon Motivating application: Google search Advaya Krishna
10 02/11 Wed Google file system (GFS) IP due, 11:59pm Yitan Li
11 02/16 Mon Distributed computing models: Map-reduce Zhimin Wang
12 02/18 Wed Replication: Paxos Sean Klein
13 02/23 Mon Lock services: Chubby Durga Prasad
14 02/25 Wed Table-based storage: BigTable Chuang Xie
15 03/02 Mon Incremental updates: Percolator Mrigesh Kalvani
16 03/04 Wed Globally distributed storage: Spanner GP abstracts due, 11:59pm Anuj Patel
17 03/9 Mon No class - Spring break ---
18 03/11 Wed No class - Spring break ---
19 03/16 Mon Request tracing: Dapper Vinay Bhat
20 03/18 Wed Key-Value Stores: SILT Sameera Padhye
21 03/23 Mon RAMCloud Darsh Shah
22 03/25 Wed Datacenter overview GP oral mid-term reports Dave O'Hallaron
23 03/30 Mon Energy-efficient computing: FAWN Yijie Ma
24 04/01 Wed Virtual machine overview Debjani Biswas
25 04/06 Mon HW/SW virtualization Omkar Gawde
26 04/08 Wed VM implementations tbd
27 04/13 Mon Nested virtualization tbd
28 04/15 Wed Live migration tbd
29 04/20 Mon No class - GP prep ---
30 04/22 Wed No class - GP reports due Thur GP reports due, Thur 4/23, 11:59pm ---
31 04/27 Mon No class - GP reviews due GP reviews due, Sun 4/26, 11:59pm ---
32 04/29 Wed GP poster session Location: DH 2105all
05/04 Sun GP final reports due, 11:59pm ---

6. Detailed course schedule

Students who are not leading the discussion for a particular class should prepare a single 1-page critique. Unless explictly noted, the critique should cover all papers with a "*".

Bring a hardcopy (no email) of your critique with you to class and give it to the TA after class. He will grade it and return it to you next class.

Class 1: Welcome and intro

Big picture and intellectual overview.

  • Note: No critiques are due today.
  • Eric A. Brewer, Lessons from Giant-Scale Services, IEEE Internet Computing, Vol 5, Num. 4, Aug, 2001. (pdf)

Class 2: System design principles

  • Note: Your critique should list three other examples (not discussed by the authors) of end-to-end arguments in system design.
  • *J. Saltzer, D. Reed, and D. Clark, End-to-End Arguments in System Design, ACM Transactions on Computer Systems, Vol 2, No 4, Nov, 1984. (pdf)
  • Butler Lampson, Hints for Computer System Design ACM Operating Systems Rev. 15, 5 (Oct. 1983), pp 33-48. Reprinted in IEEE Software 1, 1 (Jan. 1984), pp 11-28. (html)

Class 3: No class - MLK day

Class 4: Server design: Basics

  • Note: Please write a single critique covering both papers.
  • *V. Pai, P. Druschel, and W. Zwaenepoel, Flash: An efficient and portable Web server, Proceedings of the USENIX 1999 Annual Technical Conference, 1999. (pdf)
  • *Tim Brecht , David Pariag, and Louay Gammo, accept()able Strategies for Improving Web Server Performance, Proceedings of the USENIX 2004 Annual Technical Conference, June, 2004. (pdf)
  • D. Mosberger and T. Jin. httperf: A Tool for Measuring Web Server Performance. Performance Evaluation Review, Volume 26, Number 3, December 1998, 31-37. (Originally appeared in Proceedings of the 1998 Internet Server Performance Workshop, June, 1998.) ( html) )

Class 5: Server design: Events vs. threads

  • Note: Please write a single critique covering both papers.
  • *Gaurav Banga, Jeff Mogul and Peter Druschel, A scalable and explicit event delivery mechanism for UNIX, in the Proceedings of the USENIX 1999 Technical Conference, June 1999. (pdf)
  • *R. von Behren, J. Condit and E. Brewer, Why Events Are A Bad Idea (for high-concurrency servers), Proceedings of HotOS IX, Lihue, Kauai, Hawaii, May, 2003. (pdf)
  • J. K. Osterhout, Why Threads Are a Bad Idea (for most purposes), Presentation at 1996 Usenix Annual Technical Conference, January, 1996 (pdf)
  • E. A. Lee, The Problem with Threads, IEEE Computer, May, 2006 (pdf)

Class 6: Comparing server performance

  • *David Pariag, Tim Brecht, Ashif Harji, Peter Buhr, and Amol Shukla, Comparing the Performance of Web Server Architectures, EuroSys 2007, Lisbon, Portugal, March, 2007. (pdf)

Class 7: Measuring server capacity

  • *G. Banga and P. Druschel, Measuring the Capacity of a Web Server under Realistic Loads, World Wide Web Journal (Special issue on World Wide Web Characterization and Performance Evaluation), 2(1), May 1999. (pdf)

Class 8: Clustering: Request routing

  • Note: Please write a single critique covering both papers.
  • *V. Pai, M. Aron, G. Banga, M. Svendsen, P. Drushel, W. Zwaenepoel, E. Nahum, Locality-aware request distribution in cluster-based network services. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, October 1998. (pdf)
  • *M. Aron, D. Sanders, P. Druschel, and W. Zwaenepoel. Scalable Content-aware Request Distribution in Cluster-based Network Servers. In Proceedings of the USENIX 2000 Annual Technical Conference, June, 2000. (pdf)

Class 9: Motivating Application: Google search

  • Note: Please write a single critique covering both papers.
  • *Sergey Brin and Larry Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, Seventh International World Wide Web Conference / Computer Networks 30(1-7): 107-117. 1998. (pdf)
  • *Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd, The PageRank Citation Ranking: Bringing Order to the Web, 1998. (pdf)
  • Ian Rogers, The Google Pagerank Algorithm and How It Works (html) May, 2002.

Class 10: Google file system (GFS)

  • Note: Please write a single critique covering both papers.
  • *Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The Google File System, in Proceedings of the 19th ACM Symposium on Operating Systems Principles, October, 2003. (pdf)
  • *Kirk McKusick and Sean Quinlan, GFS: Evolution on Fast-Forward, CACM, March, 2010. (html)

Class 11: Distributed computing models: Map-reduce

  • *J. Dean, and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, in Proceedings of Sixth Symposium on Operating System Design and Implementation, December, 2004. (pdf)

Class 12: Replicaton: Paxos

  • *Tushar Chandra, Robert Griesemer, Joshua Redstone, Paxos Made Live - An Engineering Perspective, in ACM Symposium on Principles of Distributed Computing (PODC '07), Aug, 2007. (html)
  • Michael Swift, "Paxos, Agreement, Consensus", Lecture notes for CS 739, Spring 2012, Univ of Wisc, A clear and concise description of the algorithm and its behavior under various scenarios (pdf)
  • Leslie Lamport, Paxos Made Simple, ACM SIGACT News (Distributed Computing Column) 32, 4 (December 2001) 51-58. (pdf)

Class 13: Lock services: Chubby

  • *M. Burrows, The Chubby Lock Service for Loosely-Coupled Distributed Systems, in Proceedings of the Seventh Symposium on Operating System Design and Implementation, December, 2006. (pdf)

Class 14: Table-based storage: BigTable

  • *F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, Bigtable: A Distributed Storage System for Structured Data, in Proceedings of the Seventh Symposium on Operating System Design and Implementation, December, 2006. (pdf)

Class 15: Incremental Web updates: Percolator

  • *D. Peng and F. Dabek, Large-scale Incremental Processing Using Distributed Transactions and Notifications, OSDI, 2010. (pdf)

Class 16: Globally distributed storage: Spanner

  • *J. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford, Spanner: Google's Globally-Distributed Database, OSDI'12, 2012, Awarded Jay Lepreau Best Paper Award. (pdf)

Class 17: No class - Spring break

Class 18: No class - Spring break

Class 19: Request tracing: Dapper

  • *Benjamin H. Sigelman, Luiz Andre Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, Chandan Shanbhag, Dapper, a Large-Scale Distributed Systems Tracing Infrastructure, Google Technical Report dapper-2010-1, April 2010. (pdf)

Class 20: Key-value stores: SILT

  • *H. Lim, B. Fan, D. Andersen, M. Kaminsky, SILT: A Memory-Efficient, High-Performance Key-Value Store, SOSP, Oct, 2011. (pdf)

Class 21: RAMCloud

  • *Stephen M. Rumble, Ankita Kejriwal, and John Ousterhout, Log-structured Memory for DRAM-based Storage, FAST'14. Awarded best paper. (pdf)

Class 22: Datacenter Overview

  • Note: Please write a single critique covering the chapter you find most interesting
  • *Luiz Andre Barrosa and Urs Holzle, The Datacenter as a Computer, Morgan & Claypool, 2009. (pdf)

Class 23: Energy-efficient computing: FAWN

  • Note: Please write a single critique covering both papers.
  • *David Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan, FAWN: A Fast Array of Wimpy Nodes, SOSP 2009, Big Sky, MT. October 2009. Awarded best paper. (pdf)
  • *V. Vasudevan, D. Andersen, M. Kaminsky, L. Tan, J. Franklin, I. Moraru Energy-efficient Cluster Computing with FAWN: Workloads and Implications, Proceedings of e-energy, 2010. Invited paper. (pdf)

Class 24: VM overview

  • Note: Please write a single critique covering both papers.
  • *Mendel Rosenblum and Tal Garfinkel, Virtual Machine Monitors: Current Technology and Future Trends, IEEE Computer, May, 2005. (pdf)
  • *G. Neiger, A. Santoni, F. Leung, D. Rodgers, R. Uhlig, "Intel Virtualization Technology: Hardware Support for Efficient Processor Virtualization", Intel Technology Journal, Aug, 2006. (pdf)

Class 25: HW/SW virtualization

  • *K. Adams, and O. Agesen, A Comparison of Software and Hardware Techniques for x86 Virtualization, In Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, 2006. (pdf)
  • Nikhil Bhatia, "Performance Evaluation of Intel EPT Hardware Assist", VWware white paper, 2009 (pdf)
  • Carl A. Waldspurger, Memory Resource Management in VMware ESX Server, Operating System Design and Implementation (OSDI 02), Boston, MA, Dec, 2002. (html)

Class 26: VM implementations

  • Note: Please write a single critique covering both papers.
  • *P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, A. Warfiel, Xen and the Art of Virtualization, In Proceedings of the 19th ACM Symposium on Operating Systems Principles, October, 2003. (pdf)
  • *A. Kivity, Y. Kamay, Dor Laor, U. Lublin, A. Liguori, kvm: the Linux Virtual Machine Monitor, Proceedings of the Linux Symposium, Volume 1, pp 225-230, Ottawa, Ontario, Canada, June 2007. (pdf)
  • F. Bellard, QEMU, a Fast and Portable Dynamic Translator, Usenix Annual Technical Conference, 2005. (pdf)

Class 27: Nested Virtualization

  • *M. Ben-Yehuda, M. Day, Z. Dubitzky, M. Factor, N. HarEl, A. Gordon, A. Liguori, O. Wasserman, B. Yassour, The Turtles Project: Design and Implementation of Nested Virtualization, OSDI, 2010. Awarded best paper. (pdf)

Class 28: VM live migration

  • *Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hanseny, Eric July, Christian Limpach, Ian Pratt, Andrew Warfield, Live Migration of Virtual Machines, NSDI '05, 2005. (pdf)

Class 29: No class - GP prep

Class 30: No class - GP reports due

Class 31: No class - GP reviews due

Class 32: GP poster session