Team #4

18-749: Fault-Tolerant Distributed Systems
Spring 2006

 

Home

Interfaces and Architecture

Fault Tolerant Baseline

High Performance

Downloads

Fault-Tolerance Evaluation                    Fault-Tolerance Evaluation Results

Fault-Tolerant Design – Architecture

Replication Manager:

Fault Injector:

1)       Where the user tells the fault injector which server to kill and how often.

2)       Where the fault injector randomly kills servers at some predefined interval

Server Beans:

Client:

Database:

 


Fault-Tolerant Baseline

 

Summary:  The fault-tolerant baseline currently uses a replication manager that will automatically launch, heartbeat, and re-launch servers.

This ensures that there are always a certain number of servers running and that one of them is the primary.  The replication manager and
client are both able to handle a failure, with the replication manager switching the primary and rebooting and the client reconnecting to the
new primary.  Currently, process crashes are handled, node crashes are detected, but it is possible that a re-launch is attempted on that
node, and duplicate detection is taken care of with transaction IDs for the non idempotent functions.

 

Fault-Tolerant Baseline demo code available here

 


Fault-Tolerance Evaluation

 

Chief Experimenter:      Jon Gray

Necessary Implementation Changes for Evaluation

·        Create an automatic client written in java that does the following

o   Allows for a constant and configurable inter-request time

o   These two functions also need to be updated so that they accept a parameter
that contains the client’s hostname so that this can be recorded in the server’s
log files.

Necessary Scripts Required

·        Create a script that will run varying numbers of java auto clients with different values
for inter-request size and reply message size.  Specifically,

Design

·        The script for launching clients and going through all of the tests will be done in either
perl or shell scripting languages.

 

Completed Fault-Tolerance Evaluation Files

Data: team4data.tar.gz

Report: team4Analysis.pdf