Our work has multiple facets to it. Our projects often involve a synergistic combination
of log analysis, machine-learning, systems development, visualization and fault-tolerance
Failure Diagnosis for Cloud Computing. We are focusing on analyzing the behavior
of cloud-computing platforms (particularly, Hadoop) to understand how to localize the
node that is the source of performance problems. We have developed log-analysis
techniques (SALSA), visualization techniques (Mochi), black-box metric analysis (Ganesha),
amongst other techniques.
Failure Diagnosis for HPC Systems. We are focusing on failure diagnosis for
high-performance file-systems such as PVFS and Lustre. We have developed black-box
failure analysis techniques that examine the OS-level metrics from these systems as
well as those that localize problems by analyzing system-calls alone.
Failure Diagnosis for Automotive Systems. We are focusing on embedded automotive
systems, focusing on the specific kinds of failures that occur and that need to be
diagnosed in that domain. We are interested in failure diagnosis for improved runtime
Diagnosis-Driven Online Recovery.We are focusing on online (and not just offline)
diagnosis, primarily in support of triggering more informed fault-recovery. We have
aimed to scale the diagnosis approaches that we have developed to support rapid
recovery and to demonstrate how diagnosis driven by recovery is superior to naive
Visualization for Failure Diagnosis.Recognizing that system adminstrators need
tools for large-scale troubleshooting, we have developed visualization techniques and
tools to support the rapid manual localization of performance problems by displaying
multiple different views of a system's execution.