Improving the end-to-end dependability of distributed systems

Tuesday Sept. 21, 2010
Hamerschlag Hall D-210
4:00 pm


Tudor Dumitras
Electrical and Computer Engineering Dept., Carnegie Mellon University


Traditional fault-tolerance approaches concentrate almost entirely on responding to, avoiding, or tolerating unexpected faults or security violations. However, scheduled events, such as software upgrades, account for most of the system unavailability and often introduce data loss or latent errors. In this talk, I will present two empirical studies that identify the leading causes of upgrade failure—breaking hidden dependencies—and of planned downtime—changing database schemas—in distributed enterprise systems. I will also describe Imago, a system that incorporates end-to-end mechanisms for improving the dependability of large-scale distributed systems that undergo major software upgrades.

The key idea is to isolate the production system from the upgrade operations in order to avoid breaking hidden dependencies. The end-to-end upgrade is an atomic operation, executed online even when performing complex schema and data conversions. Imago harnesses the opportunities provided by emerging technologies, such as cloud computing, to simplify major enterprise-system upgrades and to improve their dependability. This approach separates the functional aspects of the upgrade (e.g., data migration) from the mechanisms for online upgrade (e.g., atomic switchover), enabling an upgrades-as-a-service model.


Tudor Dumitras is a Ph.D. candidate in the Electrical and Computer Engineering Department at Carnegie Mellon University, working with Prof. Priya Narasimhan. His research focused on improving the dependability of large-scale distributed systems (addressing operator errors during software upgrades), of enterprise systems (addressing the predictability of fault-tolerant middleware), and of embedded systems (addressing soft errors in networks-on-chip). He received the 2009 John Vlissides Award, from ACM SIGPLAN, for showing significant promise in applied software research, and the Best Paper Award at ASP-DAC'03. He holds undergraduate degrees from the Ecole Polytechnique in Paris and the “Politehnica” University in Bucharest.

