BRIEF BIO
I joined Carnegie Mellon University in 2003 as a
Systems Faculty with the Electrical and Computer
Engineering Department and the Information Networking
Institute. I was previously a Research Staff Member with
Motorola's Broadband Communications Division in San Diego,
CA, where I was involved in the H.264 video-compression
standardization activity. I received a Motorola
Outstanding Performance award in 2002 in recognition of my
contributions to global standardization activities. Prior
to this, I received my Ph.D. in March 2000 from the
University of California, Santa Barbara and my
B.Tech. degree from
IIT Bombay in 1994.
RESEARCH
My research interests are in the area of problem diagnosis or
fingerpointing in large-scale distributed systems. Problem
diagnosis involves instrumenting a given system to gather meaningful
data, and analyzing the collected data to detect the source or even the root
cause of the problems in the system. Fingerpointing is a challenging
problem because the distributed nature of processing/computation can cause the
problem to affect the behavior of all the nodes in the system. We are currently
working on identifying performance problems in MapReduce systems such as Hadoop,
and file systems such as PVFS, Lustre, BFS and CoreFS. Our current fingerpointing algorithms
use black-box data and/or white-box data to fingerpoint a faulty node in Hadoop and
the filesystems.
My current research projects include
the following:
- Problem Diagnosis in PVFS/Lustre: Automatically diagnosing performance
problems in parallel file systems by identifying, gathering and analyzing either OS-level
black-box performance metrics or system call attributes across parallel file systems.
- Kahuna: Diagnosing performance problems in Hadoop by comparing OS-level performance
metrics and Hadoop's log statistics across all the nodes of a cluster to fingerpoint a faulty node.
- SALSA: Analyzing Logs as StAte machines: SALSA examines Hadoop logs to derive a state-machine
view of the system's execution along with control-flow, data-flow models and related statistics. The state-machine
view of Hadoop is then used for failure diagnosis and visualizing the Hadoop's distributed behavior.
- Gumshoe: Failure diagnosis
in distributed systems through the application
of statistical anomaly-detection algorithms, machine-learning
techniques such as clustering, etc.
I am fortunate to work with talented students such as
Jiaqi Tan
, Soila Kavulya,
Michael Kasick
and Xinghao Pan. I am also affiliated with the Center for Sensed
Critical Infrastructure Research
(CenSCIR) and Parallel Data Lab (PDL) at
CMU.
RECENT PUBLICATIONS
-
Visual, Log-based Causal Tracing for Performance Debugging of MapReduce Systems
Jiaqi Tan, Soila Kavulya, Rajeev Gandhi, Priya Narasimhan. to be presented at IEEE International Conference on Distributed Computing Systems (ICDCS), Genoa, Italy, Jun 2010
-
An Analysis of Traces from a Production MapReduce Cluster
Soila Kavulya, Jiaqi Tan, Rajeev Gandhi, Priya Narasimhan, to be presented at IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Melbourne, Australia, May 2010
-
Kahuna: Problem Diagnosis for MapReduce-Based Cloud Computing Environments
Jiaqi Tan, Xinghao Pan, Soila Kavulya, Rajeev Gandhi and Priya Narasimhan, to be presented at IEEE/IFIP Network Operations and Management Symposium (NOMS), Osaka, Japan (April 2010)
-
Black-Box Diagnosis in Parallel File Systems
Michael P. Kasick, Jiaqi Tan, Rajeev Gandhi and Priya Narasimhan, to be presented at USENIX Conference on File and Storage Technologies (FAST), San Jose, CA (Feb 2010)
-
System-Call Based Problem Diagnosis for PVFS
Michael Kasick, Keith Bare, Eugene Marinelli, Rajeev Gandhi and Priya Narasimhan, Fifth Workshop on Hot Topics in System Dependability (HotDep), Lisbon, Portugal, June 2009
The list of all my publications can be found here.
TEACHING
I teach the Fundamentals of Embedded Systems
(18-342/14-642) course at Carnegie Mellon University.
This practical, hands-on course introduces students to
the basic building-blocks and the underlying
scientific principles of embedded systems. The course
covers both the hardware and software aspects of
embedded processor architectures, along with operating
system fundamentals, such as virtual memory,
concurrency, task scheduling and
synchronization. Through a series of laboratory
projects involving state-of-the-art processors,
students learn to understand implementation
details and to write assembly-language and C programs
that implement core embedded OS functionality, and
that control/debug features such as timers,
interrupts, serial communications, flash memory,
device drivers and other components used in typical
embedded applications. Relevant topics, such as
optimization, profiling,
and real-time operating systems are also covered.
PATENTS
- Co-inventor, Frequency coefficient scanning paths for coding digital video content.
United States Patent: 7088867. August 2006.
- Co-inventor, Macroblock level adaptive frame/field coding for digital video content.
United States Patent: 6980596. December 2005.
|