Skip main navigation

Electrical and Computer Engineering

18-749 – Fault-Tolerant Distributed Systems

12 units

The course provides an in-depth and hands-on overview of designing and developing fault-tolerant distributed systems. The course covers both the fundamental and advanced concepts of dependability, including replication, atomic multicast, group communication, consistency, checkpointing, transaction processing and fault injection, along with industrial standards and real-world practices for achieving high availability and fault-tolerance. Additional topics include the practical trade-offs and inter-relationships between fault-tolerance and other properties, such as real-time and performance. The lecture concepts are complemented through a semester-long hands-on project that involves the design, implementation and empirical evaluation of a distributed fault-tolerant, high-performance distributed system. To introduce students to the state-of-the-art technologies, the project emphasizes the use of object-oriented middleware, such as CORBA and EJB.

3 hrs. lec., 9 hrs. lab.

Prerequisites: Experience in programming and senior or graduate standing.

Prerequisite for: 18-849

Last updated on March 20, 2007

ECE classifications

Undergraduate areas

Computer Software

Graduate areas

Software Systems and Computer Networking

Links

Past semesters

S06, S05, S04, S03, S02, F99, F98, S97

Please note that the course history information is incomplete and/or may reflect different courses offered under the same course number.



5000 Forbes Avenue / Pittsburgh, PA 15213-3890 / Phone: 412-268-7400 / Fax: 412-268-2860