For some class meetings, readings will be assigned. Usually, these
readings will consist of relevant technical papers, articles or
instructor-prepared notes. Paper copies of assigned readings and
notes will be provided in class and online. However, please note that
online versions of the readings are only available when accessed from
a 128.2.* (CMU) IP address or a local-only CMU IP address.
The readings listed should be read BEFORE class on the assigned
day.
December 6: Memory-based distributed storage
Guest Speaker: Bin Fan, Alluxio
- Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks. Haoyuan Li, Ali Ghodsi, Matei Zaharia, Scott Shenkar, Ion Stoica. ACM Symposium on Cloud Computing (SOCC), 2014.
(available here)
December 4 (L16): Non-Volatile Memory (NVM) file systems
- System software for persistent memory. Subramanya R. Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, Jeff Jackson. ACM European Conference on Computer Systems (Eurosys), 2014.
(pdf)
- Recommended optional readings:
- Better I/O through byte-addressable persistent memory. Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, Derrick Coetzee. ACM Symposium on Operating Systems Principles (SOSP), 2009. (available here)
- NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories. Jian Xu, Steven Swanson. Usenix Conference on File and Storage Technologies (FAST), 2016.
(pdf)
November 20: Evolution of Google FS
Guest Speaker: Larry Greenfield, Google
- Recommended optional reading:
The Tail at Scale. Jeffrey Dean, Luiz Andre' Barroso. Communications of the ACM, 56(2), February 2013. (pdf)
November 15 (L15): More reliability techniques
- Architectures and Algorithms for On-Line Failure Recovery in Redundant Disk Arrays. Mark Holland, Garth A. Gibson, and Daniel P. Siewiorek. Appears in the Journal of Distributed and Parallel Databases, Vol. 2, No. 3, July 1994.
(available here)
- Scalable performance of the Panasas Parallel File System. (from Lecture 12 below).
November 13 (L14): Storage for data-intensive computing
- The Google File System. Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung. Appears in ACM Symposium on Operating Systems Principles (SOSP), 2003.
(pdf)
- Bigtable: A Distributed Storage System for Structured Data. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber. Appears in USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006.
(pdf)
- Recommended optional reading:
- MapReduce: Simplified Data Processing on Large Clusters. Jeffrey Dean and Sanjay Ghemawat. Appears in USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2004.
(pdf)
November 8 (L13): Data protection
- Designing for disasters. Kimberly Keeton, Cipriano Santos, Dirk Beyer, Jeffrey Chase, John Wilkes. Appears in the Proceesings of the Third Usenix Conference of File and Storage Technologies (FAST'04). (pdf)
- Optional reading:
- Getting Back Up: Understanding How Enterprise Data Backups Fail. George Amvrosiadis, Medha Bhadkamkar. Appears in the 2016 USENIX Annual Technical Conference (ATC). (pdf)
November 6 (L12): Parallel File Systems
- GPFS: A Shared-Disk File System for Large Computing Clusters. Frank Schmuck and Roger Haskin. Appears in FAST, January 2002. (pdf)
- Scalable performance of the Panasas Parallel File System. Brent Welch, Marc Unangst, et al. Appears in FAST, February 2008. (pdf)
- Serving Data to the Lunatic Fringe: The Evolution of HPC Storage. John Bent, Brad Settlemyer, and Gary Grider. Appears in USENIX ;login:, Summer 2016, Vol. 41, No. 2. (pdf)
November 1 (L11): Multi-server Distributed file systems
- Same readings as Lecture L10.
- Recommended optional reading:
Serverless Network File Systems (xFS). Thomas E. Anderson, Michael D. Dahlin, Jeanna M. Neefe, David A. Patterson, Drew S. Roselli, Randolph Y. Wang. Appears in the ACM Transactions on Computer Systems, Vol. 14, No. 1. February 1996. (pdf)
October 30: Nonvolatile memory (NVM) in Computer Systems
Guest Speaker: Frank Hady, Intel
- Platform Storage Performance With 3D XPoint Technology. Frank T. Hady, Annie Foong, Bryan Veal and Dan Williams. Proceedings of the IEEE, August 2017, pp.1822-1833. (Full Paper)
- Recommended optional reading:
The Nonvolatile Memory Transformation of Client Storage. Amber Huffman and Dale Juenemann. IEEE Computer, August 2013, pp.38-44. (Full Paper)
- Optional fun: check out reviews of the Intel 900P SSD
October 9 (L10): Distributed file systems and NAS Interfaces
- The Design and Implementation of the 4.4BSD Operating System (Marshall Kirk McKusick, Keith Bostic, Michael J. Karels, and John S. Quarterman, 1996)
Chapter 9 (The Network Filesystem) (pdf)
- Recommended optional reading:
-
Scale and Performance in a Distributed File System. John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham, Michael J. West. Appears in the ACM Transactions on Computer Systems, Vol. 6, No. 1, Pages 51-81. February 1988. (pdf)
Optional reading:
- Operating Systems: Three Easy Pieces. Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. March 2015.
- Chapter 47 (Distributed Systems): available here.
- Chapter 48 (Network File System (NFS)): available here.
- Chapter 49 (Andrew File System (NFS)): available here.
- NFS Version 3 Protocol Specification (RFC 1813: B. Callaghan, B. Pawlowski, P. Staubach, June 1995) (txt)
October 4 (L9): Disk array systems
- RAID: High-Performance, Reliable Secondary Storage. Peter M. Chen,
Edward K. Lee, Garth A. Gibson, Randy H. Katz, and David A. Patterson.
ACM Computing Surveys (CSUR), June 1994.
(PDF, same as L10)
- System Impacts of Storage Trends: Hard Errors and Testability.
Steven R. Hetzler. USENIX ;login: Vol. 36, No. 3, June 2011.
(PDF)
- Optional additional reading
- Mean time to meaningless: MTTDL, Markov models, and
storage system reliability. Kevin M. Greenan, James S. Plank,
Jay J. Wylie. 2nd USENIX Workshop on Hot Topics in Storage and File Systems
(HotStorage '10). June 2010.
(PDF)
- Parity Lost and Parity Regained. Andrew Krioukov,
Lakshmi N. Bairavasundaram, Garth R. Goodson, Kiran Srinivasan, Randy Thelen,
Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau. USENIX Conference on File
and Storage Technologies (FAST), 2008.
(PDF)
October 2: Design and Evolution of WAFL
Guest Speaker: Ram Kesavan, NetApp
- Strongly Recommended optional reading
- File System Design for an NFS File Server Appliance. David Hitz, James Lau, Michael Malcolm. 1994 USENIX Winter Conference. (pdf)
- Optional additional reading (papers that he discussed)
- Algorithms and Data Structures for Efficient Free Space Reclamation
in WAFL. Ram Kesavan, Rohit Singh, Travis Grusecki, Yuvraj Patel.
USENIX Conference on File and Storage Technologies (FAST), 2017.
(Abstract and PDF)
- Scalable Write Allocation in the WAFL File System.
Matthew Curtis-Maury, Ram Kesavan, Mrinal K. Bhattacharjee.
International Conference on Parallel Processing (ICPP), 2017.
(PDF)
- To Waffinity and Beyond: A Scalable Architecture for Incremental
Parallelization of File System Code.
Matthew Curtis-Maury, Vinay Devadas, Vania Fang, Aditya Kulkarni.
USENIX Conference on Operating System Design and Implementation (OSDI), 2016.
(Abstract and PDF)
- Think Global, Act Local: A Buffer Cache Design for Global Ordering
and Parallel Processing in the WAFL File System.
Peter R. Denz, Matthew Curtis-Maury, Vinay Devadas.
International Conference on Parallel Processing (ICPP), 2016.
(PDF)
- High Performance Metadata Integrity Protection in the WAFL
Copy-on-Write File System. Harendra Kumar, Yuvraj Patel, Ram Kesavan, Sumith Makam.
USENIX Conference on File and Storage Technologies (FAST), 2017.
(Abstract and PDF)
September 27 (L8): Disk array organization
- RAID: High-Performance, Reliable Secondary Storage. Peter M. Chen,
Edward K. Lee, Garth A. Gibson, Randy H. Katz, and David A. Patterson.
ACM Computing Surveys (CSUR), June 1994.
(PDF)
- Disk failures in the real world: What does an MTTF of 1,000,000 hours
mean to you? Bianca Schroeder and Garth A. Gibson. USENIX Conference
on File and Storage Technologies (FAST), 2007.
(PDF)
- Optional additional reading
- Flash Reliability in Production: The Expected and the
Unexpected. Bianca Schroeder, Raghav Lagisetty, and Arif Merchant.
USENIX Conference on File and Storage Technologies (FAST), 2016.
(PDF)
- Operating Systems: Three Easy Pieces. Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. March 2015.
- Chapter 38 (Redundant Arrays of Inexpensive Disks (RAIDs)): available here.
September 25 (L7): Potpourri
- MTBF Description, Kevin Dally. 1995. (txt)
September 20 (L6): Caching and FS integrity
- Soft Updates: A Solution to the Metadata Update Problem. Gregory R. Ganger, Markall Kirk McKusick, Craig A.N. Soules, Yale N. Patt. ACM Transactions on Computer Systems. May 2000. (pdf)
- Practical File System Design with the Be File System (Dominic Giampaolo, 1999)
- Chapter 7 (Journaling) (pdf)
- Optional additional reading
- Operating Systems: Three Easy Pieces. Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. March 2015.
- Chapter 42 (Crash Consistency: FSCK and Journaling): available here.
- Verifying File System Consistency at Runtime. Daniel Fryer, Kuei Sun, Rahat Mahmood, TingHao Cheng, Shaun Benjamin, Ashvin Goel, Angela Demke Brown. Conference on File and Storage Technologies (FAST), 2012. (pdf)
September 18 (L5): File system organization
- UNIX Internals: The New Frontiers. Uresh Vahalia. 1996.
- Chapter 8 (File system interface and framework) (pdf)
- Recommended optional: Practical File System Design with the Be File System. Dominic Giampaolo. 1999.
- Chapter 2 (What is a file system?) (pdf)
- Optional additional reading
- Operating Systems: Three Easy Pieces. Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. March 2015.
- Chapter 39 (Files and Directories): available here.
- Chapter 40 (File System Implementation): available here.
September 13 (L4) - File system storage layout
- The Design and Implementation of the 4.4BSD Operating System (Marshall Kirk McKusick, Keith Bostic, Michael J. Karels, and John S. Quarterman, 1996)
- Chapter 8 (Local filestores) (pdf)
- F2FS: A New File System for Flash Storage. Changman Lee, Dongho Sim, Jooyoung Hwang, Sangyeon Cho. Appears in FAST 2015. (pdf)
- Optional additional reading
- TableFS: Enhancing Metadata Efficiency in the Local File System. Kai Ren and Garth Gibson. Published as Technical Report CMU-PDL-13-102, January 2013. (available here)
- BTRFS: The Linux B-Tree Filesystem. Ohad Rodeh, Josef Bacik, and Chris Mason. Appears in ACM Transactions on Storage, Vol. 9, No. 3, Article 9, Publication date: August 2013. (pdf)
- Operating Systems: Three Easy Pieces. Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. March 2015.
- Chapter 40 (File System Implementation): available here.
- Chapter 43 (Log-structured File Systems): available here.
September 11 (L3) - Hard Disk Drives: Components and Operation
- An Introduction to Disk Drive Modeling. Chris Ruemmler and John Wilkes, 1994. (pdf)
- Scheduling Algorithms for Modern Disk Drives. Bruce L. Worthington, Gregory R. Ganger, and Yale N. Patt. Appears in the Proceedings of the ACM Sigmetrics Conference. May, 1994. (ps)
- Optional additional reading
- Digital Large System Mass Storage Handbook. Paul Massiglia. 1986.
- Operating Systems: Three Easy Pieces. Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. March 2015.
- Chapter 37 (Hard Disk Drives): available here.
August 30 (L2) - Flash Storage
- Design Tradeoffs for SSD Performance.
Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. Published in 2008 USENIX Annual Technical Conference.
- Recommended optional: The Unwritten Contract of Solid State Drives,
Jun He, Sudarsun Kannan, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau.
Published in EuroSys 2017.
- Recommended optional: Operating System Support for NVM+DRAM Hybrid Main Memory,
Jeffrey C. Mogul, Eduardo Argollo, Mehul Shah, Paolo Faraboschi.
Published in Hotos 2009.
August 28 (L1) - Introduction to 746 and Storage performance metrics
This first meeting will be more than just organizational in nature.
Of course, we will discuss how the class is going to work and what will
(and won't) be covered.
See the 15-746/18-746 overview for a recap of the
general information.
We will also dive into the course by discussing storage performance metrics.
Even though you likely won't have read them before class, there are readings
associated with Lecture 1:
- Computer Architecture: A Quantitative Approach, 3rd ed. John L. Hennessy and David A. Patterson. 2002. (pdf)
- Section 7.7: "I/O performance measures"
- Section 7.8: "A Little queuing theory"
- Section 7.9: "Benchmarks of storage performance and availability"
- Recommended: Probability Refresher. Mor Harchol-Balter. 2000. (pdf)