My name is Soila Pertet. I am a PhD Student at the Department of Electrical and Computer Engineering at Carnegie Mellon University. My research focuses on proactive problem determination in distributed systems.

CONTACT

Soila Pertet
Department of Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
Email: spertet AT ece DOT cmu DOT edu
Tel: 412-268-9608
BIO

  • I am a PhD student at the ECE department at Carnegie Mellon University. I am working with my advisor, Prof. Priya Narasimhan, on the MEAD project. I received my Master's degree in Electrical and Computer Engineering from Carnegie Mellon University in 2004.
  • In 2000, I graduated with my Bachelor's degree in Computer Science from the University of Nairobi, Kenya. (Kenya is one of the most beautiful places on earth --- hope you get a chance to visit some day).
  • I worked for two years as a programmer at 3mice Interactive Media Ltd. in Nairobi, Kenya prior to joining graduate school. 


Soila at Uhuru Peak


Me at Uhuru Peak, Mt Kilimanjaro (Jan 2006)  
Highest point in Africa. Alt: 5895m
RESEARCH

Proactive Problem Determination in Distributed Systems
My research focuses on determining the root-cause of failures ("fingerpointing") in distributed systems. I am particularly interested in correlated failures, for instance due to shared infrastructure like fault-tolerant protocols or resource contention by other applications on a single node. I would like to understand what system metrics to monitor and whether we can apply a hierarchical approach to problem diagnosis, e.g., using low-overhead metrics to flag the failure and then turning on more instrumentation to further drill down on the failure. A key aspect of this is understanding the tradeoffs between monitoring and the granularity of problem diagnosis. The icing on the cake would be not only to diagnose the failures but also to discover patterns that helps us to predict failures and apply proactive fault-recovery approaches.

PATENT APPLICATION

US Patent Application No: 20060074500. Fault tolerant control system. Sanjeev Naik, Pradyumna K. Mishra and Soila Pertet. April 6, 2006 (based on work done during summer internship at General Motors in 2004).
PUBLICATIONS

Journal Papers

MEAD: Support for Real-Time Fault-Tolerant CORBA. Priya Narasimhan, Tudor A. Dumitras, Aaron M. Paulos, Soila M. Pertet, Charles F. Reverte, Joseph G. Slember and Deepti Srivastava. Concurrency and Computation: Practice and Experience, vol. 17, no. 12, 2005, pp. 1527-1545; Copyright 2005 John Wiley and Sons

Conference papers & workshops

Implementing Prato, a database on demand service. Soila Pertet, John Wilkes and Jay Wylie.  Invited paper - Workshop on Policy-Based Autonomic Computing (PBAC), Jacksonville, FL, June 2007.
Fingerpointing Correlated Failures in Replicated Systems. Soila M. Pertet, Rajeev Gandhi, Priya Narasimhan. USENIX Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SysML), Cambridge, MA, April 2007.
Fault-Tolerant Propulsion-By-Wire System for Vehicle with Electric Wheel Motors. Sanjeev Naik, P.K. Mishra, Soila Pertet, David Beaulieu and Jeff Wolak. American Control Conference, Minneapolis, MN, June 2006.
Handling Propagating Faults: The Case for Topology-Aware Fault-Recovery. Soila Pertet and Priya Narasimhan. DSN Workshop on Hot Topics in System Dependability, Yokohama, Japan, June 2005.
Proactive Recovery in Distributed CORBA Applications. Soila Pertet and Priya Narasimhan.  IEEE Conference on Dependable Systems and Networks(DSN), Florence, Italy, June 2004.

Short papers

Prato: databases on demand. Soila Pertet, Priya Narasimhan, John Wilkes and Jay Wylie.  Poster - International Conference on Autonomic Computing (ICAC) Jacksonville, FL, June 2007.
Proactive Problem Determination in Transaction-Oriented Applications. Soila Pertet, Priya Narasimhan, Anca Sailer and Gautam Kar. DSN Fast Abstract - Yokohama, Japan, June 2005.

Technical reports

Causes of Failure in Web Applications. Soila Pertet and Priya Narasimhan. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-05-109. December 2005.
Proactive Fault-Recovery in Distributed Systems. Soila Pertet. (Master's Thesis, Department of Electrical & Computer Engineering, Carnegie Mellon University, May 2004)