Team 2: The House

18-749: Fault-Tolerant Distributed Systems
Spring 2006


Team Members:

Team Roles:

 

Paul

Mo

Jun

Suk Chan

Joohoon

Baseline Application

x

 

 

 

 

Project Management & Webmastering

x

 

 

 

 

Fault-Tolerance: Client

x

x

 

 

 

Fault-Tolerance: Evaluation Lead

 

 

 

 

x

Fault-Tolerance: Replication Management and Fault Detection

 

 

x

x

 

Project Title: Party Blackjack

Baseline Application Description: Online fault-tolerant, real-time, high-performance gaming application where users play Blackjack and store his/her information in a database

Configuration:

Third-party software, if any (databases):

Baseline Application Features:

Reliability Requirements:

Real-Time Requirements:

Performance Requirements:


Architecture

 

HostBean (session): Every request from the client goes through this bean, including saving/retrieving profiles.
FloorBean (entity): This bean represents the floor of our casino. It has a one-to-many CMR with TableBean.
TableBean (entity): This bean represents a Blackjack table. It has a one-to-many CMR with PlayerBean. It keeps track of the deck, and is responsible for passing around the turn.
PlayerBean (entity): This bean represents the profile of a player. It keeps track of cards he's holding, current balance, name, ID, and password.

 

Baseline Application

 

Current Status: completed

Downloads



Fault-Tolerant Baseline Application

Each replica, as it starts, notifies the replication manager of its presence and sends its JNDI address. The replication manager keeps track of alive/dead replicas. After being notified of a new replica's presence, the replication manager notifies the fault detector, which communicates with local fault detectors on the replicas. The local fault detector is used to detect process failures, while the (global) fault detector is used to detect machine failures and notify the replication manager of process/machine failures. When a client gets an exception and fails over, it asks the replication manager for a new primary.

* No checkpointing and state transfer is necessary for now since our servers are completely stateless.

* Each client will have a unique ID and an operation ID for each state-changing invocation to avoid duplicate state changes.
* The operation ID will increment with each method invocation that changes state. Both unique ID and operation ID are kept in the database.

Scenarios/Interactions

1. There is one active primary, and one warm-passive backup, named 'server1' and 'server2' respectively.
2. The client starts and looks up available servers in the global JNDI. It finds 'server1' and connects to it.
3. Everytime the client wants to create an object, it sends it to the replication manager, instead of the replicas. The replication manager then creates duplicate beans in both servers.
4. The client wants to create a Host stateless session bean, so it sends the request to the replication manager, which in turn creates two identical session beans in both primary and backup servers.
5. 'server1' dies and the client gets an exception.
6. The fault detector detects that 'server1' is dead and notifies the replication manager.
7. The replication manager modifies the global JNDI accordingly.
8. The client keeps getting an exception until 'server1' is removed from the global JNDI.
9. The client looks into the global JNDI and finds that 'server1' in no longer there but 'server2' is available. It connects to 'server2'.
10. The replication manager attempts to remotely restart 'server1'.
11. Once the fault detector notifies the replication manager that 'server1' has revived, the replication manager adds 'server1' to the end of available servers list in the global JNDI.

Current Status: Completed

Fault-Tolerance Evaluation

FT-eval.doc

749-Team2-Evaluation.doc (Preliminary Result)

749-Team2-Evaluation.pdf (Preliminary Result)

749_probe_data1_12.zip (RAW Probe data for configuration 1~12 (out of 48))

Final Report!

Downloads

Final Raw Data!



Real-Time Fault-Tolerant Baseline Application

Scenarios/Interactions

Current Status

Finished

Downloads

Failover evaluation, graph, piechart, and proposed strategies (.doc) (.pdf)


Final Report (ppt)
CODE/BINARIES (Final Demo al\ so available at /afs/ece/class/ece749/public_html/teams-06/team2/final_demo. See INSTRUCTIONS in the folder for explicit in\ structions on how to run our system.)



High-Performance Real-Time Fault-Tolerant Baseline Application

Scenarios/Interactions

Current Status

Downloads