18-845: Internet Services
Carnegie Mellon University, Spring 2015

Syllabus (pdf) | Critiques | Individual Project (IP) | Group Project (GP)

1. Instructors

Prof. David O'Hallaron, droh@cs.cmu.edu, GHC 7517, (412) 268-8199,
Office hours: Wed 4-5pm (or by appt.)

TA: Lou Clark, johnclar@andrew.cmu.edu,
Office hours: Fri 12:30-1:30pm (or by appt.), GHC 5202 (Collaborative Commons)

2. Organization

Class times: Mon and Wed, 2:30-3:50, DH 2105
Web page: www.ece.cmu.edu/~ece845
Class mailing list: 18-845@cs.cmu.edu
Staff mailing list: 18-845-staff@cs.cmu.edu
Blackboard: We will not be using Blackboard.
Course directory: /afs/ece/class/ece845

3. Reference material

There is no required textbook for 18-845. The following are standard references for Linux programming and network programming:
  • Michael Kerrisk, The Linux Programming Interface: A Linux and UNIX System Programming Handbook, No Starch Press, 2010.
  • W. Richard Stevens, Bill Fenner, Andrew M. Rudoff Unix Network Programming: The Sockets Networking API, Volume 1 (3rd Edition), Prentice Hall, 2003.
The CS:APP2e text, which is available in the campus bookstore and on permanent reserve in the Engineering library, covers system-level programming topics such as dynamic linking, process control, Unix I/O, the sockets interface, writing Web servers, and application level concurrency and synchronization:

4. Cluster resources

  • Andrew cluster: linux.andrew.cmu.edu
    • RHEL 6x, 64-bit, login using your Andrew credentials
  • Gates cluster: ghc{01..81}.ghc.andrew.cmu.edu
    • RHEL 6x, 64-bit, login using your Andrew credentials
  • ECE cluster: ece{000-031}.ece.cmu.edu
    • SuSE 12x, 64 bit, login using your ECE credentials
    • See here for details. Contact help@ece.cmu.edu for help with accounts.
  • Other ECE clusters: "games" and "dead mathematician" clusters
    • See here for details. Contact help@ece.cmu.edu for help with accounts.

5. Course schedule

Legend: IP: individual project, GP: group project

Class Date Day Topic Projects Discussion Leader
1 01/12 Mon Intro and welcome Dave O'Hallaron
2 01/14 Wed System design principles IP out Dave O'Hallaron
3 01/19 Mon No class - MLK day ---
4 01/21 Wed Server design: Basics Dave O'Hallaron
5 01/26 Mon Server design: Events vs. threads Dave O'Hallaron
6 01/28 Wed Comparing server performance Lou Clark
7 02/02 Mon Measuring server capacity Vijay Jayaram
8 02/04 Wed Clustering: Request routing Jiayu Liu
9 02/9 Mon Motivating application: Google search Advaya Krishna
10 02/11 Wed Google file system (GFS) IP due, 11:59pm Yifan Li
11 02/16 Mon Distributed computing models: Map-reduce Zhimin Wang
12 02/18 Wed Replication: Paxos Sean Klein
13 02/23 Mon Lock services: Chubby Durga Prasad
14 02/25 Wed Table-based storage: BigTable Chuang Xie
15 03/02 Mon Incremental updates: Percolator Mrigesh Kalvani
16 03/04 Wed Stream processing: MillWheel GP abstracts due, 11:59pm Anuj Patel
17 03/9 Mon No class - Spring break ---
18 03/11 Wed No class - Spring break ---
19 03/16 Mon Globally distributed storage: Mesa Sameera Padhye
20 03/18 Wed DRAM-based storage: RAMCloud Darsh Shah
21 03/23 Mon Request tracing: Dapper Vinay Bhat
22 03/25 Wed Datacenter overview GP oral mid-term reports Dave O'Hallaron
23 03/30 Mon Energy-efficient computing: FAWN Yijie Ma
24 04/01 Wed Virtual machine overview Debjani Biswas
25 04/06 Mon HW/SW virtualization Omkar Gawde
26 04/08 Wed VM implementations tbd
27 04/13 Mon Nested virtualization tbd
28 04/15 Wed Live migration tbd
29 04/20 Mon No class - GP prep ---
30 04/22 Wed No class - GP reports due Thur GP reports due, Thur 4/23, 11:59pm ---
31 04/27 Mon No class - GP reviews due GP reviews due, Sun 4/26, 11:59pm ---
32 04/29 Wed GP poster session Location: DH 2105all
05/04 Sun GP final reports due, 11:59pm ---

6. Detailed course schedule

Students who are not leading the discussion for a particular class should prepare a single 1-page critique. Unless explictly noted, the critique should cover all papers with a "*".

Bring a hardcopy (no email) of your critique with you to class and give it to the TA after class. He will grade it and return it to you next class.

Class 1: Welcome and intro

Big picture and intellectual overview.

  • Note: No critiques are due today.
  • Eric A. Brewer, Lessons from Giant-Scale Services, IEEE Internet Computing, Vol 5, Num. 4, Aug, 2001. (pdf)

Class 2: System design principles

  • Note: Your critique should list three other examples (not discussed by the authors) of end-to-end arguments in system design.
  • *J. Saltzer, D. Reed, and D. Clark, End-to-End Arguments in System Design, ACM Transactions on Computer Systems, Vol 2, No 4, Nov, 1984. (pdf)
  • Butler Lampson, Hints for Computer System Design ACM Operating Systems Rev. 15, 5 (Oct. 1983), pp 33-48. Reprinted in IEEE Software 1, 1 (Jan. 1984), pp 11-28. (html)

Class 3: No class - MLK day

Class 4: Server design: Basics

  • Note: Please write a single critique covering both papers.
  • *V. Pai, P. Druschel, and W. Zwaenepoel, Flash: An efficient and portable Web server, Proceedings of the USENIX 1999 Annual Technical Conference, 1999. (pdf)
  • *Tim Brecht , David Pariag, and Louay Gammo, accept()able Strategies for Improving Web Server Performance, Proceedings of the USENIX 2004 Annual Technical Conference, June, 2004. (pdf)
  • D. Mosberger and T. Jin. httperf: A Tool for Measuring Web Server Performance. Performance Evaluation Review, Volume 26, Number 3, December 1998, 31-37. (Originally appeared in Proceedings of the 1998 Internet Server Performance Workshop, June, 1998.) ( html) )

Class 5: Server design: Events vs. threads

  • Note: Please write a single critique covering both papers.
  • *Gaurav Banga, Jeff Mogul and Peter Druschel, A scalable and explicit event delivery mechanism for UNIX, in the Proceedings of the USENIX 1999 Technical Conference, June 1999. (pdf)
  • *Mike Bayer, Asynchronous Python and Databases, (pdf), Feb 15, 2015.
  • R. von Behren, J. Condit and E. Brewer, Why Events Are A Bad Idea (for high-concurrency servers), Proceedings of HotOS IX, Lihue, Kauai, Hawaii, May, 2003. (pdf)
  • J. K. Osterhout, Why Threads Are a Bad Idea (for most purposes), Presentation at 1996 Usenix Annual Technical Conference, January, 1996 (pdf)
  • E. A. Lee, The Problem with Threads, IEEE Computer, May, 2006 (pdf)

Class 6: Comparing server performance

  • *David Pariag, Tim Brecht, Ashif Harji, Peter Buhr, and Amol Shukla, Comparing the Performance of Web Server Architectures, EuroSys 2007, Lisbon, Portugal, March, 2007. (pdf)

Class 7: Measuring server capacity

  • *G. Banga and P. Druschel, Measuring the Capacity of a Web Server under Realistic Loads, World Wide Web Journal (Special issue on World Wide Web Characterization and Performance Evaluation), 2(1), May 1999. (pdf)

Class 8: Clustering: Request routing

  • Note: Please write a single critique covering both papers.
  • *V. Pai, M. Aron, G. Banga, M. Svendsen, P. Drushel, W. Zwaenepoel, E. Nahum, Locality-aware request distribution in cluster-based network services. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, October 1998. (pdf)
  • *M. Aron, D. Sanders, P. Druschel, and W. Zwaenepoel. Scalable Content-aware Request Distribution in Cluster-based Network Servers. In Proceedings of the USENIX 2000 Annual Technical Conference, June, 2000. (pdf)

Class 9: Motivating Application: Google search

  • Note: Please write a single critique covering both papers.
  • *Sergey Brin and Larry Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, Seventh International World Wide Web Conference / Computer Networks 30(1-7): 107-117. 1998. (pdf)
  • *Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd, The PageRank Citation Ranking: Bringing Order to the Web, 1998. (pdf)
  • Ian Rogers, The Google Pagerank Algorithm and How It Works (html) May, 2002.

Class 10: Google file system (GFS)

  • Note: Please write a single critique covering both papers.
  • *Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The Google File System, in Proceedings of the 19th ACM Symposium on Operating Systems Principles, October, 2003. (pdf)
  • *Kirk McKusick and Sean Quinlan, GFS: Evolution on Fast-Forward, CACM, March, 2010. (html)

Class 11: Distributed computing models: Map-reduce

  • *J. Dean, and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, in Proceedings of Sixth Symposium on Operating System Design and Implementation, December, 2004. (pdf)

Class 12: Replicaton: Paxos

  • *Tushar Chandra, Robert Griesemer, Joshua Redstone, Paxos Made Live - An Engineering Perspective, in ACM Symposium on Principles of Distributed Computing (PODC '07), Aug, 2007. (html)
  • Diego Ongaro and John Ousterhout, In Search of an Understandable Consensus Algorithm, USENIX, 2014. (https://www.usenix.org/system/files/conference/atc14/atc14-paper-ongaro.pdf)
  • Michael Swift, "Paxos, Agreement, Consensus", Lecture notes for CS 739, Spring 2012, Univ of Wisc, A clear and concise description of the algorithm and its behavior under various scenarios (pdf)
  • Leslie Lamport, Paxos Made Simple, ACM SIGACT News (Distributed Computing Column) 32, 4 (December 2001) 51-58. (pdf)

Class 13: Lock services: Chubby

  • *M. Burrows, The Chubby Lock Service for Loosely-Coupled Distributed Systems, in Proceedings of the Seventh Symposium on Operating System Design and Implementation, December, 2006. (pdf)

Class 14: Table-based storage: BigTable

  • *F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, Bigtable: A Distributed Storage System for Structured Data, in Proceedings of the Seventh Symposium on Operating System Design and Implementation, December, 2006. (pdf)

Class 15: Incremental Web updates: Percolator

  • *D. Peng and F. Dabek, Large-scale Incremental Processing Using Distributed Transactions and Notifications, OSDI, 2010. (pdf)

Class 16: Stream processing: MillWheel

  • *Tyler Akidau, Alex Balikov, Kaya Bekiroglu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, Sam Whittle, MillWheel: Fault-Tolerant Stream Processing at Internet Scale, VLDB, 2013. (pdf)

Class 17: No class - Spring break

Class 18: No class - Spring break

Class 19: Globally distributed storage: Mesa

  • *Ashish Gupta, Fan Yang, Jason Govig, Adam Kirsch, Kelvin Chan, Kevin Lai, Shuo Wu, Sandeep Dhoot, Abhilash Kumar, Ankur Agiwal, Sanjay Bhansali, Mingsheng Hong, Jamie Cameron, Masood Siddiqi, David Jones, Jeff Shute, Andrey Gubarev, Shivakumar Venkataraman, Divyakant Agrawal, Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing, VLDB, 2014. (pdf)

Class 20: DRAM-based storage: RAMCloud

  • *Stephen M. Rumble, Ankita Kejriwal, and John Ousterhout, Log-structured Memory for DRAM-based Storage, FAST'14. Awarded best paper. (pdf)

Class 21: Request tracing: Dapper

  • *Benjamin H. Sigelman, Luiz Andre Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, Chandan Shanbhag, Dapper, a Large-Scale Distributed Systems Tracing Infrastructure, Google Technical Report dapper-2010-1, April 2010. (pdf)

Class 22: Datacenter Overview

  • Note: Please write a single critique covering the chapter you find most interesting (skip Chapter 6 on Modeling Costs)
  • *Luiz Andre Barrosa, Jimmy Clidaras, and Urs Holzle, The Datacenter as a Computer, Second Edition, Morgan & Claypool, July 2013. (pdf)

Class 23: Energy-efficient computing: FAWN

  • Note: Please write a single critique covering both papers.
  • *David Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan, FAWN: A Fast Array of Wimpy Nodes, SOSP 2009, Big Sky, MT. October 2009. Awarded best paper. (pdf)
  • *V. Vasudevan, D. Andersen, M. Kaminsky, L. Tan, J. Franklin, I. Moraru Energy-efficient Cluster Computing with FAWN: Workloads and Implications, Proceedings of e-energy, 2010. Invited paper. (pdf)

Class 24: VM overview

  • Note: Please write a single critique covering both papers.
  • *Mendel Rosenblum and Tal Garfinkel, Virtual Machine Monitors: Current Technology and Future Trends, IEEE Computer, May, 2005. (pdf)
  • *G. Neiger, A. Santoni, F. Leung, D. Rodgers, R. Uhlig, "Intel Virtualization Technology: Hardware Support for Efficient Processor Virtualization", Intel Technology Journal, Aug, 2006. (pdf)

Class 25: HW/SW virtualization

  • *K. Adams, and O. Agesen, A Comparison of Software and Hardware Techniques for x86 Virtualization, In Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, 2006. (pdf)
  • Nikhil Bhatia, "Performance Evaluation of Intel EPT Hardware Assist", VWware white paper, 2009 (pdf)
  • Carl A. Waldspurger, Memory Resource Management in VMware ESX Server, Operating System Design and Implementation (OSDI 02), Boston, MA, Dec, 2002. (html)

Class 26: VM implementations

  • Note: Please write a single critique covering both papers.
  • *P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, A. Warfiel, Xen and the Art of Virtualization, In Proceedings of the 19th ACM Symposium on Operating Systems Principles, October, 2003. (pdf)
  • *A. Kivity, Y. Kamay, Dor Laor, U. Lublin, A. Liguori, kvm: the Linux Virtual Machine Monitor, Proceedings of the Linux Symposium, Volume 1, pp 225-230, Ottawa, Ontario, Canada, June 2007. (pdf)
  • F. Bellard, QEMU, a Fast and Portable Dynamic Translator, Usenix Annual Technical Conference, 2005. (pdf)

Class 27: Nested Virtualization

  • *M. Ben-Yehuda, M. Day, Z. Dubitzky, M. Factor, N. HarEl, A. Gordon, A. Liguori, O. Wasserman, B. Yassour, The Turtles Project: Design and Implementation of Nested Virtualization, OSDI, 2010. Awarded best paper. (pdf)

Class 28: VM live migration

  • *Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hanseny, Eric July, Christian Limpach, Ian Pratt, Andrew Warfield, Live Migration of Virtual Machines, NSDI '05, 2005. (pdf)

Class 29: No class - GP prep

Class 30: No class - GP reports due

Class 31: No class - GP reviews due

Class 32: GP poster session