18-845: Internet Services
Carnegie Mellon University, Spring 2017
Syllabus (pdf)
|
Critiques
|
Individual Project (IP)
|
Group Project (GP)
1. Instructors
Prof. David O'Hallaron,
droh@cs.cmu.edu, GHC 7517
Office hours: Mon 4-5pm (or by appt.)
TA: Nate Horan, nhoran@andrew.cmu.edu
Office hours: Mon 11-12pm Wean coffee shop or Library, Thur: 2-3pm GHC 3rd floor
2. Organization
Class times: Mon and Wed, 2:30-3:50, DH 2105
Web page: www.ece.cmu.edu/~ece845
Class mailing list: 18-845@cs.cmu.edu
Blackboard: We will not be using Blackboard.
Piazza: We will not be using Piazza.
Course directory: /afs/ece/class/ece845
3. Reference material
There is no required textbook for 18-845.
The following are standard references for Linux programming and network programming:
-
Michael Kerrisk, The Linux Programming Interface: A Linux and UNIX
System Programming Handbook, No Starch Press, 2010.
- W. Richard Stevens, Bill Fenner, Andrew M. Rudoff Unix Network
Programming: The Sockets Networking API, Volume 1 (3rd Edition),
Prentice Hall, 2003.
The CS:APP3e text, which is
available in the campus bookstore and on permanent reserve in the
Engineering library, covers system-level programming topics such as
dynamic linking, process control, Unix I/O, the sockets interface,
writing Web servers, and application level concurrency and
synchronization:
4. Linux cluster resources
- Andrew cluster: linux.andrew.cmu.edu
- RHEL, 64-bit, login using your Andrew credentials
- SCS Gates cluster: ghc{26..86}.ghc.andrew.cmu.edu
- RHEL, 64-bit, login using your Andrew credentials
- ECE cluster: ece{000-031}.ece.local.cmu.edu
- SuSE, 64 bit, login using your ECE credentials
- See here for details. Contact help@ece.cmu.edu for help with accounts.
5. Course schedule
Legend:
IP: individual project,
GP: group project
6. Detailed course schedule
Students who are not leading the discussion for a particular class
should prepare a single 1-page critique. Unless
explictly noted, the critique should cover all papers with a "*".
Bring a hardcopy (no email) of your critique with you to class and give it to
the TA after class.
TA will grade it and return it to you next class.
Class 1: No class - MLK day
Class 2: Welcome and intro
Class 3: System design principles
- Note: Your critique should list three other examples (not discussed
by the authors) of end-to-end arguments in system design.
- *J. Saltzer, D. Reed, and D. Clark,
End-to-End Arguments in System Design,
ACM Transactions on Computer Systems, Vol 2, No 4, Nov, 1984.
(pdf)
Class 4: Server design: Basics
- Note: Please write a single critique covering both papers.
-
*V. Pai, P. Druschel, and W. Zwaenepoel, Flash: An efficient and portable
Web server, Proceedings of the USENIX 1999 Annual Technical Conference,
1999. (pdf)
-
*Tim Brecht , David Pariag, and Louay Gammo,
accept()able Strategies for Improving Web Server Performance,
Proceedings of the USENIX 2004 Annual Technical Conference, June, 2004.
(pdf)
-
D. Mosberger and T. Jin. httperf: A Tool for Measuring Web Server
Performance. Performance Evaluation Review, Volume 26, Number 3,
December 1998, 31-37. (Originally appeared in Proceedings of the 1998
Internet Server Performance Workshop, June, 1998.)
(
html)
)
Class 5: Comparing server performance
-
*David Pariag, Tim Brecht, Ashif Harji, Peter Buhr, and Amol Shukla,
Comparing the Performance of Web Server Architectures,
EuroSys 2007, Lisbon, Portugal, March, 2007.
(pdf)
- Ashif S. Harji, Peter A. Buhr, Tim Brecht, Comparing
High-Performance Multi-core Web-Server Architectures, SYSTOR'12, ACM, 2012.
(pdf)
Class 6: Measuring server capacity
-
*G. Banga and P. Druschel,
Measuring the Capacity of a Web Server under Realistic Loads,
World Wide Web Journal (Special issue on World Wide Web
Characterization and Performance Evaluation), 2(1), May 1999.
(pdf)
Class 7: Motivating Application: Google search
- Note: Please write a single critique covering both papers.
-
*Sergey Brin and Larry Page,
The Anatomy of a Large-Scale Hypertextual Web Search Engine,
Seventh International World
Wide Web Conference / Computer Networks 30(1-7): 107-117. 1998.
(pdf)
-
*Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd,
The PageRank Citation Ranking: Bringing Order to the Web,
1998.
(pdf)
- Ian Rogers, The Google Pagerank Algorithm and How It Works
(html)
May, 2002.
Class 8: Google file system (GFS)
- Note: Please write a single critique covering both papers.
-
*Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung,
The Google File System, in Proceedings of the
19th ACM Symposium on Operating Systems Principles,
October, 2003.
(pdf)
-
*Kirk McKusick and Sean Quinlan, GFS: Evolution on Fast-Forward, CACM, March, 2010.
(html)
Class 9: Distributed data processing
- Note: Please write a single critique covering both papers.
-
*J. Dean, and S. Ghemawat, MapReduce: Simplified Data Processing on Large
Clusters, in Proceedings of Sixth Symposium on Operating System Design and
Implementation, December, 2004.
(pdf)
-
Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert Henry, Robert Bradshaw, Nathan,
FlumeJava: Easy, Efficient Data-Parallel Pipelines, PLDI, 2010.
(html)
Class 10: Distributed stream processing
- *Tyler Akidau, Alex Balikov, Kaya Bekiroglu, Slava Chernyak, Josh Haberman,
Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, Sam Whittle,
MillWheel: Fault-Tolerant Stream Processing at Internet Scale, VLDB, 2013.
(pdf)
Class 11: Advanced processing: Cloud Dataflow
- *Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak,
Rafael J. Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills,
Frances Perry, Eric Schmidt, Sam Whittle, The Dataflow Model: A
Practical Approach to Balancing Correctness, Latency, and Cost in
Massive-Scale, Unbounded, Out-of-Order Data Processing, VLDB
Endowment, 2015. Based on MillWheel and FlumeJava.
(pdf)
Class 12: Advanced processing: TensorFlow
- *Martín Abadi et al, TensorFlow: A System for Large Scale Machine Learning
OSDI'16, 2016.
(pdf)
Class 13: Advanced processing: Apache Spark
-
*Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma,
Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for
In-Memory Cluster Computing
NSDI'12, 2012, Awarded best paper.
(pdf)
Class 14: Replicaton: Paxos
-
*Tushar Chandra, Robert Griesemer, Joshua Redstone, Paxos Made Live
- An Engineering Perspective, in ACM Symposium on Principles of
Distributed Computing (PODC '07), Aug, 2007.
(html)
- Michael Swift, "Paxos, Agreement, Consensus", Lecture notes for CS
739, Spring 2012, Univ of Wisc, A clear and concise description of
the algorithm and its behavior under various scenarios
(pdf)
- Angus MacDonald, Paxos by Example, Web post, 2012.
(html).
Helpful step-by-step example with multiple leaders.
-
Diego Ongaro and John Ousterhout, In Search of an Understandable
Consensus Algorithm, USENIX, 2014.
(pdf)
-
Leslie Lamport, Paxos Made Simple, ACM SIGACT News (Distributed
Computing Column) 32, 4 (December 2001) 51-58.
(pdf)
Class 15: Lock services: Chubby
-
*M. Burrows, The Chubby Lock Service for Loosely-Coupled Distributed Systems, in
Proceedings of the Seventh Symposium on Operating System Design and Implementation (OSDI'06),
December, 2006.
(pdf)
Class 16: Table-based storage: BigTable
-
*F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows,
T. Chandra, A. Fikes, and R. E. Gruber, Bigtable: A Distributed Storage
System for Structured Data, in Proceedings of the Seventh Symposium on
Operating System Design and Implementation (OSDI'06), December, 2006.
(pdf)
Class 17: No class - Spring break
Class 18: No class - Spring break
Class 19: Distributed Stores: Spanner
- *J. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman,
S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh,
S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle,
S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor,
R. Wang, and D. Woodford,
Spanner: Google's Globally-Distributed Database,
OSDI'12, 2012, Jay Lepreau Best Paper Award.
(pdf)
Class 20: Distributed Stores: memcached
-
*R. Nishtala et al,
Scaling Memcache at Facebook, NSDI '13.
(pdf)
Class 21: Distributed Stores: Tao
-
Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter
Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni,
Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song,
and Venkat Venkataramani,
TAO: Facebook's Distributed Data Store for the Social Graph,
Usenix, 2013.
(pdf)
-
Haonan Lu, Kaushik Veeraraghavan, Philippe Ajoux, Jim Hunt,
Yee Jiun Song, Wendy Tobagus, Sanjeev Kumar, Wyatt Lloyd,
Existential Consistency:
Measuring and Understanding Consistency at Facebook,
SOSP'15, 2015.
(pdf)
Class 22: DRAM-based storage: RAMCloud
-
*Stephen M. Rumble, Ankita Kejriwal, and John Ousterhout,
Log-structured Memory for DRAM-based Storage, FAST'14. Awarded best paper.
(pdf)
-
John Ousterhout, Parag Agrawal, David Erickson,
Christos Kozyrakis, Jacob Leverich, David Mazieres,
Subhasish Mitra, Aravind Narayanan, Diego Ongaro,
Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble,
Eric Stratmann, and Ryan Stutsman,
The Case for RAMCloud,
CACM, July, 2011.
(pdf)
Class 23: Datacenter Management
-
*Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer,
Eric Tune, John Wilkes,
Large-scale cluster management at Google with Borg,
EuroSys 2015, Bordeaux, France.
(pdf)
-
Luiz Andre Barrosa, Jimmy Clidaras, and Urs Holzle, The Datacenter as a
Computer, Second Edition, Morgan & Claypool, July 2013.
(pdf)
Class 24: Datacenter networking: Jupiter
- *A. Singh et al, Jupiter Rising: A Decade of Clos Topologies and
Centralized Control in Google’s Datacenter Network, SIGCOMM'15, 2015.
(pdf)
Class 23: Energy-efficient computing: FAWN
-
*David Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee,
Lawrence Tan, Vijay Vasudevan, FAWN: A Fast Array of Wimpy
Nodes, SOSP 2009, Big Sky, MT. October 2009. Awarded best paper.
(pdf)
-
V. Vasudevan, D. Andersen, M. Kaminsky, L. Tan, J. Franklin, I. Moraru
Energy-efficient Cluster Computing with FAWN: Workloads and Implications,
Proceedings of e-energy, 2010. Invited paper.
(pdf)
Class 25: VM overview
- Note: Please write a single critique covering both papers.
-
*Mendel Rosenblum and Tal Garfinkel, Virtual Machine Monitors:
Current Technology and Future Trends, IEEE Computer, May, 2005.
(pdf)
-
*K. Adams, and O. Agesen, A Comparison of Software and Hardware
Techniques for x86 Virtualization, In Proceedings of the 12th
international conference on Architectural support for programming
languages and operating systems (ASPLOS'06), 2006.
(pdf)
-
G. Neiger, A. Santoni, F. Leung, D. Rodgers, R. Uhlig, "Intel
Virtualization Technology: Hardware Support for Efficient Processor
Virtualization", Intel Technology Journal, Aug, 2006.
Please skip all discussion of the Itaniums VT-i
(pdf)
Class 26: VM Implementation: VMWare
-
*Ole Agesen, Alex Garthwaite, Jeffrey Sheldon, Pratap Subrahmanyam,
The Evolution of an x86 Virtual Machine Monitor,
ACM SIGOPS Operating Systems Review archive
Volume 44 Issue 4, December 2010.
(pdf)
Class 27: VM implementation: Xen
- *P. Barham, B. Dragovic, K. Fraser, S. Hand,
T. Harris, A. Ho, R. Neugebauer, I. Pratt, A. Warfiel,
Xen and the Art of Virtualization, In
Proceedings of the 19th ACM Symposium on Operating Systems Principles,
October, 2003.
(pdf)
Class 28: Request tracing
- Note: Please write a single critique covering both papers
- *Benjamin H. Sigelman, Luiz Andre Barroso, Mike Burrows, Pat Stephenson,
Manoj Plakal, Donald Beaver, Saul Jaspan, Chandan Shanbhag,
Dapper, a Large-Scale Distributed Systems Tracing Infrastructure,
Google Technical Report dapper-2010-1, April 2010.
(pdf)
- *Michael Chow, David Meisner, Jason Flinn, Daniel Peek,
The Mystery Machine: End-to-end Performance
Analysis of Large-scale Internet Services,
OSDI, 2014.
(pdf)
Class 29: No class - GP prep
Class 30: No class - GP reports due
Class 31: No class - GP reviews due
Class 32: GP poster session
|