18-845: Internet Services
Carnegie Mellon University, Spring 2019

Syllabus (pdf) | Critiques | Individual Project (IP) | Group Project (GP)

1. Instructors

Prof. David O'Hallaron, droh@cs.cmu.edu, GHC 7517
Office hours: Mon 4:10-5:30pm (or by appt.)

TA: Nolan Hiehle, nhiehle@andrew.cmu.edu
Office hours: Thu 3-4pm, 1300 wing of Hamerschlag Hall, EE Lounge (or by appt.)

2. Organization

Class times: Mon and Wed, 2:30-3:50pm, DH 2105
Web page: www.ece.cmu.edu/~ece845
Class mailing list: 18-845@cs.cmu.edu
Blackboard: We will not be using Blackboard.
Piazza: We will not be using Piazza.
Course directory: /afs/ece/class/ece845

3. Reference material

There is no required textbook for 18-845. The following are standard references for Linux programming and network programming:
  • Michael Kerrisk, The Linux Programming Interface: A Linux and UNIX System Programming Handbook, No Starch Press, 2010.
  • W. Richard Stevens, Bill Fenner, Andrew M. Rudoff Unix Network Programming: The Sockets Networking API, Volume 1 (3rd Edition), Prentice Hall, 2003.
The CS:APP3e text, which is available in the campus bookstore and on permanent reserve in the Engineering library, covers system-level programming topics such as dynamic linking, process control, Unix I/O, the sockets interface, writing Web servers, and application level concurrency and synchronization:

4. Linux cluster resources

  • Andrew cluster: linux.andrew.cmu.edu
    • RHEL, 64-bit, login using your Andrew credentials
  • SCS Gates cluster: ghc{26..86}.ghc.andrew.cmu.edu.
    • RHEL, 64-bit, login using your Andrew credentials
    • Machines ghc{26..46} contain NVIDIA GeForc GTX 1080 GPUs. The Wikipedia entry for GeForce 10 GPUs provides useful information about this model of GPU. They support CUDA compute capability 6.1.
  • ECE cluster: ece{000-031}.ece.local.cmu.edu
    • SuSE, 64 bit, login using your ECE credentials
    • See here for details. Contact help@ece.cmu.edu for help with accounts.

5. Course schedule (final version)

Legend: IP: individual project, GP: group project

-->
Class Date Day Topic Projects Discussion Leader
1 01/14 Mon Intro and welcome Dave O'Hallaron
2 01/16 Wed System design principles IP out Dave O'Hallaron
3 01/21 Mon No class - MLK day ---
4 01/24 Wed Server design basics Dave O'Hallaron
5 01/28 Mon Event delivery mechanisms Dave O'Hallaron
6 01/30 Wed Comparing server performance Dave O'Hallaron
7 02/04 Mon Measuring server capacity Harihara Abinaya
8 02/06 Wed Motivating application: Google search Chris Smith
9 02/11 Mon Google file system (GFS) Bhavini Mishra
10 02/13 Wed Data processing: MapReduce IP due 11:59pm David Simon
11 02/18 Mon Stream processing: Samza Nolan Hiehle
12 02/20 Wed Advanced processing: Cloud Dataflow Gagan Gangorthri
13 02/25 Mon Advanced processing: TensorFlow Mingquan Chen
14 02/27 Wed Advanced processing: Apache Spark Sam Westenberg
15 03/04 Mon Replication: Paxos Chaoying Wang
16 03/06 Wed Lock services: Chubby Eric Sun
17 03/12 Mon No class - Spring break ---
18 03/14 Wed No class - Spring break ---
19 03/18 Mon Table-based storage: BigTable GP abstracts due, 11:59pm Shravya Kaudki Srinivas
20 03/20 Wed Distributed stores: Spanner Zeleena Kearney
21 03/25 Mon Distributed stores: memcached Pallavi Rajan
22 03/27 Wed Distributed stores: Tao Jayadheep Shanmugam
23 04/01 Mon DRAM-based storage: RAMCloud Nolan Hiehle
24 04/03 Wed Datacenter management: Borg GP oral mid-term reports due, in class, 2:30-4:20pm Dave O'Hallaron
25 04/08 Mon Warehouse-scale computing Dave O'Hallaron
26 04/10 Wed Virtual machines: VMWare Kinsang Ching
27 04/15 Mon Virtual machines: Xen Sriharsha Bandaru
28 04/17 Wed Virtual machines: Live migration Dhruvesh Rathore
29 04/22 Mon No class ---
30 04/24 Wed No class GP reports due Wed 4/24, 11:59pm ---
31 04/29 Mon No class GP reviews due Sunday 4/28, 11:59pm ---
32 05/01 Wed GP poster session Location: DH 2105 all
05/05 Sun GP final reports due Sunday 5/6, 11:59pm ---

6. Detailed course schedule (final version)

Students who are not leading the discussion for a particular class should prepare a single 1-page critique. Unless explictly noted, the critique should cover all papers with a "*".

Bring a hardcopy (no email) of your critique with you to class and give it to the TA after class. TA will grade it and return it to you next class.

Class 1: Welcome and intro

Class 2: System design principles

  • Note: Your critique should list three other examples (not discussed by the authors) of end-to-end arguments in system design.
  • *J. Saltzer, D. Reed, and D. Clark, End-to-End Arguments in System Design, ACM Transactions on Computer Systems, Vol 2, No 4, Nov, 1984. (pdf)

Class 3: No class - MLK day

Class 4: Server design: Basics

  • Note: Please write a single critique covering both papers.
  • *V. Pai, P. Druschel, and W. Zwaenepoel, Flash: An efficient and portable Web server, Proceedings of the USENIX 1999 Annual Technical Conference, 1999. (pdf)
  • *Tim Brecht , David Pariag, and Louay Gammo, accept()able Strategies for Improving Web Server Performance, Proceedings of the USENIX 2004 Annual Technical Conference, June, 2004. (pdf)

Class 5: Event delivery mechanisms

  • *Gaurav Banga, Jeff Mogul and Peter Druschel, A scalable and explicit event delivery mechanism for UNIX, in the Proceedings of the USENIX 1999 Technical Conference, June 1999. (pdf)

Class 6: Comparing server performance

  • *David Pariag, Tim Brecht, Ashif Harji, Peter Buhr, and Amol Shukla, Comparing the Performance of Web Server Architectures, EuroSys 2007, Lisbon, Portugal, March, 2007. (pdf)
  • Ashif S. Harji, Peter A. Buhr, Tim Brecht, Comparing High-Performance Multi-core Web-Server Architectures, SYSTOR'12, ACM, 2012. (pdf)

Class 7: Measuring server capacity

  • *G. Banga and P. Druschel, Measuring the Capacity of a Web Server under Realistic Loads, World Wide Web Journal (Special issue on World Wide Web Characterization and Performance Evaluation), 2(1), May 1999. (pdf)

Class 8: Motivating application: Google search

  • Note: Please write a single critique covering both papers.
  • *Sergey Brin and Larry Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, Seventh International World Wide Web Conference / Computer Networks 30(1-7): 107-117. 1998. (pdf)
  • *Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd, The PageRank Citation Ranking: Bringing Order to the Web, 1998. (pdf)
  • Ian Rogers, The Google Pagerank Algorithm and How It Works, May, 2002. (html)

Class 9: Google file system (GFS)

  • Note: Please write a single critique covering both papers.
  • *Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The Google File System, in Proceedings of the 19th ACM Symposium on Operating Systems Principles, October, 2003. (pdf)
  • *Kirk McKusick and Sean Quinlan, GFS: Evolution on Fast-Forward, CACM, March, 2010. (html)

Class 10: Data processing

  • *J. Dean, and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, in Proceedings of Sixth Symposium on Operating System Design and Implementation, December, 2004. (pdf)
  • J. Summers, The Friendship that Made Google Huge, New Yorker, Dec 3, 2018. Beautiful article about the 20-year friendship between Jeff Dean and Sanjay Ghemawat and the unique pair programming approach they've used to build some of Google's most important systems. (html)

Class 11: Stream processing

  • *Shadi A. Noghabi, Kartik Paramasivam, Yi Pan, Navina Ramesh, Jon Bringhurst, Indranil Gupta, and Roy H. Campbell, Samza: Stateful Scalable Stream Processing at LinkedIn, VLDB Endowment, Aug, 2017. (pdf)
  • Jay Kreps, Neha Narkhede, Jun Rao, Kafka: a Distributed Messaging System for Log Processing, NetDB'11, 2011. (pdf)

Class 12: Advanced processing: Cloud Dataflow

  • *Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, Sam Whittle, The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing, VLDB Endowment, 2015. (pdf)

Class 13: Advanced processing: TensorFlow

  • *Martin Abadi et al, TensorFlow: A System for Large Scale Machine Learning OSDI'16, 2016. (pdf)

Class 14: Advanced processing: Apache Spark

  • *Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing NSDI'12, 2012, Awarded best paper. (pdf)

Class 15: Replicaton: Paxos

  • *Tushar Chandra, Robert Griesemer, Joshua Redstone, Paxos Made Live - An Engineering Perspective, in ACM Symposium on Principles of Distributed Computing (PODC '07), Aug, 2007. (html)
  • Michael Swift, "Paxos, Agreement, Consensus", Lecture notes for CS 739, Spring 2012, Univ of Wisc, A clear and concise description of the algorithm and its behavior under various scenarios (pdf)
  • Angus MacDonald, Paxos by Example, Web post, 2012. (html). Helpful step-by-step example with multiple leaders.
  • Diego Ongaro and John Ousterhout, In Search of an Understandable Consensus Algorithm, USENIX, 2014. (pdf)
  • Leslie Lamport, Paxos Made Simple, ACM SIGACT News (Distributed Computing Column) 32, 4 (December 2001) 51-58. (pdf)

Class 16: Lock services: Chubby

  • *M. Burrows, The Chubby Lock Service for Loosely-Coupled Distributed Systems, in Proceedings of the Seventh Symposium on Operating System Design and Implementation (OSDI'06), December, 2006. (pdf)

Class 17: No class - Spring break

Class 18: No class - Spring break

Class 19: Table-based storage: BigTable

  • *F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, Bigtable: A Distributed Storage System for Structured Data, in Proceedings of the Seventh Symposium on Operating System Design and Implementation (OSDI'06), December, 2006. (pdf)

Class 20: Distributed stores: Spanner

  • *J. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford, Spanner: Google's Globally-Distributed Database, OSDI'12, 2012, Jay Lepreau Best Paper Award. (pdf)

Class 21: Distributed stores: memcached

  • *R. Nishtala et al, Scaling Memcache at Facebook, NSDI '13. (pdf)

Class 22: Distributed stores: Tao

  • *Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, and Venkat Venkataramani, TAO: Facebook's Distributed Data Store for the Social Graph, Usenix, 2013. (pdf)

Class 23: DRAM-based storage: RAMCloud

  • *Stephen M. Rumble, Ankita Kejriwal, and John Ousterhout, Log-structured Memory for DRAM-based Storage, FAST'14. Awarded best paper. (pdf)
  • John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazieres, Subhasish Mitra, Aravind Narayanan, Diego Ongaro, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman, The Case for RAMCloud, CACM, July, 2011. (pdf)
  • Consider Fast Crash Recovery in RAMCloud (Ongaro, 2011)

Class 24: Datacenter management

  • *Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes, Large-scale cluster management at Google with Borg, EuroSys 2015, Bordeaux, France. (pdf)

Class 25: Warehouse-scale computing

  • Note: Please write a single critique covering the chapter you find most interesting (skip Chapter 6 on Modeling Costs and Chapter 7 on Failures and Repairs)
  • *Luiz Andre Barrosa, Jimmy Clidaras, and Urs Holzle, The Datacenter as a Computer, Second Edition, Morgan & Claypool, July 2013. (pdf)

Class 26: Virtual machines: VMWare

  • *K. Adams, and O. Agesen, A Comparison of Software and Hardware Techniques for x86 Virtualization, In Proceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS'06), 2006. (pdf)
  • G. Neiger, A. Santoni, F. Leung, D. Rodgers, R. Uhlig, "Intel Virtualization Technology: Hardware Support for Efficient Processor Virtualization", Intel Technology Journal, Aug, 2006. Please skip all discussion of the Itaniums VT-i (pdf)
  • Ole Agesen, Alex Garthwaite, Jeffrey Sheldon, Pratap Subrahmanyam, The Evolution of an x86 Virtual Machine Monitor, ACM SIGOPS Operating Systems Review archive Volume 44 Issue 4, December 2010. (pdf)
  • Mendel Rosenblum and Tal Garfinkel, Virtual Machine Monitors: Current Technology and Future Trends, IEEE Computer, May, 2005. (pdf)
  • Wes Felter, Alexandre Ferreira, Ram Rajamony, Juan Rubio, Updated Performance Comparison of Virtual Machines and Linux Containers, IBM Research Report, RC25482 (AUS1407-001) July 21, 2014 (pdf)

Class 27: Virtual machines: Xen

  • *P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, A. Warfiel, Xen and the Art of Virtualization, In Proceedings of the 19th ACM Symposium on Operating Systems Principles, October, 2003. (pdf)

Class 28: Virtual machines: Live migration

  • *Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hanseny, Eric July, Christian Limpach, Ian Pratt, Andrew Warfield, Live Migration of Virtual Machines, NSDI '05, 2005. (pdf)

Class 29: No class

Class 30: No class

Class 31: No class

Class 32: GP poster session