Team #6: Team Slackers

18-749: Fault-Tolerant Distributed Systems
Spring 2006



Table of Contents
  1. Project Information
    1. Team Members
    2. Team Roles
    3. Project Title
    4. Baseline Application Description
    5. CVS
    6. Configuration
    7. Third-party software, if any (databases)
    8. Baseline Application Features
    9. Dependability Requirements
    10. Real-Time Requirements
    11. Performance Requirements
  2. Baseline Application
    1. Code Documentation
    2. Scenarios/Interactions
    3. Current Status
    4. Downloads
    5. Feedback
      1. 2/11/2006: Feedback on project proposal
      2. Feedback on project interfaces and end-to-end use case
      3. Feedback on end-to-end use case
      4. 3/22/2006: Checkpoint 1 Presentation
  3. Fault-Tolerant Baseline Application
    1. Interfaces
      1. Tiers
      2. Methods
      3. Attributes
      4. Exceptions
      5. Servers and Ports
    2. Database Tables
      1. Client
      2. Level
      3. Lot
      4. LotDistance
    3. Scenarios/Interactions
    4. Code Documentation
    5. Current Status
    6. Downloads
    7. Fault Tolerance Design
    8. Fault-Tolerance Evaluation
    9. Feedback 4/3/2006
  4. Real-Time Fault-Tolerant Baseline Application
    1. Scenarios/Interactions
    2. Current Status
    3. Feedback 5/2/2006
  5. High-Performance Real-Time Fault-Tolerant Baseline Application
    1. Scenarios/Interactions
    2. Code Documentation
    3. Current Status
    4. Downloads (Final Demo Deliverables)

Project Information

Team Members

Team Roles

Responsibility Hyunwoo Karim Puneet Tanmay Steven
Project Management       X X
Requirements Specification X   X    
Architecture Design X X   X  
Database Design   X X   X
Implementation - Client     X X  
Implementation - Server X X     X
Testing X X X X X
Tool Support         X
Performance Analysis   X   X X
Real-time Analysis X X X    
Reliability Analysis X   X X  
Presentation X X X X X

Project Title

Park'n Park

Baseline Application Description

A system that manages parking lots by keeping track of how many spaces are available in lots and recommends other lots when the lot a driver is at is full

CVS

Configuration

Third-party software, if any (databases)

Baseline Application Features

  1. User enters a lot at an entry level and gets the available levels. In the current version, the entry level is always the first level.
  2. The system will let the driver know at the entrance whether there is space available in the lot or not.
  3. If there is space available in the lot, the system will let the driver know which levels have available spaces after the driver enters the lot.
  4. If no space is available in the lot, the system will let the driver know the nearest parking lots that have available space.
  5. If no space is available in any of the lots, the system will display an appropriate message.
  6. Once in a lot, the user can move from one level to either the next higher or lower level when such a level exists.
  7. In a lot, the user can only exit at an exit level, which is currently defined as the first level.
  8. The user will get informed of attempts to enter nonexistent lots.
  9. The user cannot exit a lot if their car is not in one.
  10. The user cannot enter a lot if their car is already in a lot. The car must leave its current lot first.
  11. The server will report unrecoverable database conditions and other major problems to the client to inform them that service is unavailable.
  12. The client, which represents a car, can choose to drive up to any lot in the system.

Dependability Requirements

  1. The driver will experience at most a five second delay upon a single server failure. Service will continue after the five second delay.
  2. The system will be available 24x7 without any critical system failures, defined as failures that prevent the system from performing its duties to its clients.
  3. The result displayed to the driver will be within plus or minus one parking space of the actual physical state of the lot.

Real-Time Requirements

  1. On the client side, the driver will receive a result within at most two seconds both under normal operating conditions with at least one server running and during a fault when at least one backup server is operating normally. This only applies when no network faults are in progress and no more than ten clients are presently using the system.
  2. Under normal operating conditions and when no faults occur, the middle tier will obtain results from the database on read-only queries while a car is entering or leaving a lot within 15 milliseconds when at most 10 clients are using the system and at most five parking lots are listed in the database.
  3. Under normal operating conditions and when no faults occur, the middle tier will obtain results from the database on read-only queries while a car is entering or leaving a level within 15 milliseconds when at most 10 clients are using the system and at most five parking lots are listed in the database.

Performance Requirements

  1. The system will support up to 10 simultaneous calls to getLots() under normal working conditions.
  2. The system will allow up to 50 cars to enter a parking lot each second under normal working conditions, when no faults occur, and when 10 or fewer clients are using the system, given that the lots do not become full.
  3. The system will support up to 25 parking lots.


Baseline Application

Code Documentation

Scenarios/Interactions

  1. User enters a lot at an entry level and gets the available levels. In the current version, the entry level is always the first level.
  2. The system will let the driver know at the entrance whether there is space available in the lot or not.
  3. If there is space available in the lot, the system will let the driver know which levels have available spaces after the driver enters the lot.
  4. If no space is available in the lot, the system will let the driver know the nearest parking lots that have available space.
  5. If no space is available in any of the lots, the system will display an appropriate message.
  6. Once in a lot, the user can move from one level to either the next higher or lower level when such a level exists.
  7. In a lot, the user can only exit at an exit level, which is currently defined as the first level.
  8. The user will get informed of attempts to enter nonexistent lots.
  9. The user cannot exit a lot if their car is not in one.
  10. The user cannot enter a lot if their car is already in a lot. The car must leave its current lot first.
  11. The server will report unrecoverable database conditions and other major problems to the client to inform them that service is unavailable.
  12. The client, which represents a car, can choose to drive up to any lot in the system.

Current Status

Downloads

Feedback

2/11/2006: Feedback on project proposal

Feedback on project interfaces and end-to-end use case

Feedback on end-to-end use case

3/22/2006: Checkpoint 1 Presentation


Fault-Tolerant Baseline Application

Interfaces

Tiers

Methods

Client Manager Factory (one per server)

Client Manager (one per client)

Replication Manager (one per system)

Attributes

Exceptions

Servers and Ports

Database

girltalk:13306 for MySQL; this makes the --jdbc-url parameter equal to jdbc:mysql://girltalk:13306/ece749_team6

Middle Tier

go:7779 for IIOP
chess:7779 for IIOP

Replication Manager

boggle: 7780 for IIOP

Naming Service

boggle:7777 for bootstrap and 7778 for IIOP

Database Tables

Client

Level

Lot

LotDistance

Scenarios/Interactions

  1. User enters a lot at an entry level and gets the available levels. In the current version, the entry level is always the first level.
  2. The system will let the driver know at the entrance whether there is space available in the lot or not.
  3. If there is space available in the lot, the system will let the driver know which levels have available spaces after the driver enters the lot.
  4. If no space is available in the lot, the system will let the driver know the nearest parking lots that have available space.
  5. If no space is available in any of the lots, the system will display an appropriate message.
  6. Once in a lot, the user can move from one level to either the next higher or lower level when such a level exists.
  7. In a lot, the user can only exit at an exit level, which is currently defined as the first level.
  8. The user will get informed of attempts to enter nonexistent lots.
  9. The user cannot exit a lot if their car is not in one.
  10. The user cannot enter a lot if their car is already in a lot. The car must leave its current lot first.
  11. The server will report unrecoverable database conditions and other major problems to the client to inform them that service is unavailable.
  12. The client, which represents a car, can choose to drive up to any lot in the system.
  13. If the primary server gets killed, then the replication manager will notice the failure and select a new primary server if a backup server exists. The client will notice the failure when it tries to perform its next server call. The client will then periodically poll the name service for the registered primary server and retry the server call.
  14. If the primary server gets killed and no backup servers are running, the replication manager will notice the failure and remove the active server name from the name service. The client will notice the failure when it tries to perform its next server call. The client will notice that no primary server is registered in the name service, notify the user that the system is down, and then exit.

Code Documentation

Current Status

Downloads

Fault Tolerance Design

Fault-Tolerance Evaluation

Feedback 4/3/2006


Real-Time Fault-Tolerant Baseline Application

Scenarios/Interactions

Unchanged from the Fault-Tolerant Baseline Application

Current Status

Real-time fault-tolerant baseline application is complete.

Feedback 5/2/2006


High-Performance Real-Time Fault-Tolerant Baseline Application

Scenarios/Interactions

  1. User enters a lot at an entry level and gets the available levels. In the current version, the entry level is always the first level.
  2. The system will let the driver know at the entrance whether there is space available in the lot or not.
  3. If there is space available in the lot, the system will let the driver know which levels have available spaces after the driver enters the lot.
  4. If no space is available in the lot, the system will let the driver know the nearest parking lots that have available space.
  5. If no space is available in any of the lots, the system will display an appropriate message.
  6. Once in a lot, the user can move from one level to either the next higher or lower level when such a level exists.
  7. In a lot, the user can only exit at an exit level, which is currently defined as the first level.
  8. The user will get informed of attempts to enter nonexistent lots.
  9. The user cannot exit a lot if their car is not in one.
  10. The user cannot enter a lot if their car is already in a lot. The car must leave its current lot first.
  11. The server will report unrecoverable database conditions and other major problems to the client to inform them that service is unavailable.
  12. The client, which represents a car, can choose to drive up to any lot in the system.
  13. User can optionally start up the client in "get lots only" mode, which repeatedly calls the remote getLots() method for the requested number of invocations.
  14. If the server that the client randomly connected to gets killed, then the client will select a server instance that has not had a problem in the previous 5 * (fault detection timeout) milliseconds. A background thread will keep track of which servers are up along with which ones have problems, automatically failing over to a working server. The replication manager will notice a faulty server on its own, unregister that server's name from the name service, and restart that server.
  15. If all servers are killed and cannot be restarted, the replication manager will notice the failure and remove the active server name from the name service. The client will notice that no primary server is registered in the name service and, during a remote method invocation attempt, will notify the user that the system is down and exit. Renaming the server shell script to another name temporarily and then killing all servers can trigger this condition. Sometimes, however, this condition can be triggered if all servers are killed too quickly. A hypothetical future enhancement could have the replication manager hold off on declaring that the system is down until several seconds have elapsed.

Code Documentation

Current Status

We have achieved bounded high-performance real-time fault tolerance in our application.

As a result of our experiments and analyses, we can state that we have achieved a bounded failover time of one second when the fault detection wait time is set to 1500ms, the fault injection rate is set to 4000ms, at most 10 clients are using the system, two servers are running, the faults are kill-server faults (kill -9 or killServer()), CPU load on the servers is minimal, memory usage on the servers is minimal, no network faults or performance degradations are taking place, the replication manager is running, and all other conditions are normal when running on hardware similar to or better than the CMU ECE games cluster computers.

Downloads (Final Demo Deliverables)