ECE Department
Carnegie Mellon University
Pittsburgh, PA 15213
Current
Project
Dependable, Dynamic
Upgrades in Distributed Systems
Implementing online software upgrades (changes in the behavior,
configuration, code, data or topology of a running application)
is essential for enabling the self-regulating, autonomic management
and maintenance of enterprise computer systems. Such dynamic change-management
is difficult to perform because of the complex interactions between
the distributed components:
The dependecies between system components are not always well
documented and are very hard to track. A dynamic upgrading system
must be careful not to disable existing applications by breaking
unknown dependencies, while updating all the components required
by the new version of the application being installed.
When upgrading distributed systems, this problem is even more
acute because there are additional sources of dependencies (e.g.,
networking protocols, middleware, routes). For example, upgrading
a component to a version that exposes a modified RPC API (e.g., a new COM interface, a modified CORBA object or a WSDL method with different
parameters) requires patching all the entities that
reference the upgraded component, taking into account the fact
that sometimes the old and new APIs may be incompatible.
A related problem, specific to distributed systems, is that
sometimes upgrades have to be performed across mutually-distrustful
administrative domains, while preserving the same correctness
invariants and coherence of the forward and reverse dependencies.
Dynamic upgrades must preserve the correctness of the system,
which often requires the transfer of state, composed of persistent
and even transient data. Many applications require massive amounts
of data to be converted to new schemas, which happens over a long
period of time during which clients may request transactions involving
the same data being converted.
Dynamic upgrading systems must also assess the impact of the
changes on the running services and determine the most opportune
moment to apply the upgrade to avoid significant penalties due
to degraded performance and dependability, while improving
the value of the infrastructure according to some well-defined
metrics.
The upgrading process must be reliable and tolerate faults
without the loss of data or functionality.
This work is part of the Middleware
for Embedded Adaptive Dependability (MEAD) project. The goal
of this project is to enhance distributed CORBA applications with
new capabilities, including: transparent, yet tunable, fault tolerance
in real time, proactive dependability, resource-aware system adaptation
to crash, communication and timing faults with scalable and fast
fault-detection and fault-recovery.
Versatile dependability defines a hierarchy of low-level and high-level
control knobs. Low-level knobs control the internal fault-tolerant
mechanisms of the infrastructure and typically correspond to discrete
(e.g., the degree of replication) or even non-countable sets (e.g.,
replication styles). In contrast, high-level knobs should
regulate external properties (e.g., scalability, availability) that
are relevant to the systems users and hide internal implementation
details, and they should have a linear transfer characteristic,
with unsurprising effects for the users.
I have implemented the first version of MEAD. My contributions
to the project include:
Defining and implementing a "control knob" for tuning
the system scalability;
Designing a mechanism for switching between active and passive
replication on-the-fly;
Discovering the "magical 1% effect": the impredictability
(in terms of uncontrollably-high end-to-end latencies) of a fault-tolerat
CORBA application is isolated to 1% of the remote invocations.
Stochastic
communication is a new communication paradigm for on-chip networks.
As opposed to traditional system-on-chip (SoC) communication architectures,
which are organized around shared buses, the networks-on-chip (NoCs)
suggest to place the various modules of a SoC
in the nodes of a regular structure (for example a rectangular grid)
and to connect them with a micro-network. This requires more sophisticated
communication protocols which have to take into account the almost
random faults specific to modern deep-sub-micron (DSM) technologies,
that cannot be handled by the current CAD
tools. Relaxing the requirement of 100% correctness for devices and
interconnects would drastically reduce the costs of design but, at
the same time, it requires that SoCs
be designed with some degree of system level fault-tolerance. Stochastic
communication defines a new class of protocols for the on-chip networks,
based on a randomized broadcast algorithm. Our results show that stochastic
communication is resilient to the faults specific to DSM
technologies, while maintaining a constant or gracefully degrading
latency. The design methodology associated with stochastic communication
provides fault-tolerance and high performance while drastically simplifying
the task of the designer. The wide range of applicability of this
method, combined with the current trend to have several clock/frequency/voltage
domains on a single chip, lead us to believe that our technique will
create a major paradigm shift in SoC
design. Stochastic communication continues to be developed in the
System-Level Design research
group at Carnegie Mellon.
Assistive Technologies
Eye of
the Beholder: Text-Recognition System for the Visually-Impaired
Blind and visually-impaired people cannot access essential
information in the form of written text in our environment (e.g.,
on restaurant menus, street signs, door labels, product names and
instructions, expiration dates). We have developed a mobile text-recognition
system capable of extracting written information from a wide variety
of sources and communicating it on-demand to the user. The user needs
no additional hardware except an ordinary, Internet-enabled mobile
camera-phone - a device that many visually-impaired individuals already
own. This approach fills a gap in assistive technologies for the visually-impaired
because it makes users aware of textual information not available
to them through any other means.
Color-Blindness Correction System
Tests performed during the past 50 years on the few people with one
normal eye and another one affected by colorblindness have lead to
the creation of an approximate model for simulating the effects of
this genetic deficiency. Based on this model, we have discovered that,
by applying color-space filtering and processing, images can be enhanced
for the color-blind vision. In normal images, there may be some patterns
that are completely hidden for an eye with deficient vision. However,
by applying our technique, these
patterns are revealed, with only minimal changes to the content of
the images. This result also shows that perception-based image and
video encoding is possible, by keeping only the color information
that is relevant to the viewer's eye.
Undergraduate Research Projects
Open Source
Atomic Broadcast Package
Atomic Broadcast is a very important communication primitive
for reliable, distributed systems. The specification of Atomic
Broadcast states that the same messages should be delivered in the
same order at all the receivers. This is a very difficult problem,
proven impossible to solve in a totally asynchronous system. I took
part in an effort to develop Atom, an open source Atomic Broadcast
package for UNIX, based on unreliable failure detectors.
The Argo
search engine
This was an undergraduate project to develop a Web search engine
and crawler in Java. Argo was able to answer a request in 0.3 s
(in 2002), to maintain a distributed database and to analyze the
indexed sites in parallel
Misc
What does this C program print out?
int val = 3; void Exit(int val) { printf("%d", val); exit(0); }
void usr1_handler(int sig) { Exit(val); }
int main() { int pid; signal(SIGUSR1, usr1_handler); if ((pid = fork()) == 0) { setpgid(0, 0); if (fork()) Exit(val + 1); else Exit(val - 1); } kill(-pid, SIGUSR1); }
If you think you know the answer ... think again. Then read this
page.