Project Page

Team 2: The House

18-749: Fault-Tolerant Distributed Systems
Spring 2006

Team Members:

Hwi H. Cheong "Paul" hcheong@andrew.cmu.edu
Mohammad Ahmad "Mo" mohman@cmu.edu
Jun Han junhan@andrew.cmu.edu
Suk Chan Kang sckang@andrew.cmu.edu
Joohoon Lee jool@andrew.cmu.edu

Team Roles:

	Paul	Mo	Jun	Suk Chan	Joohoon
Baseline Application	x
Project Management & Webmastering	x
Fault-Tolerance: Client	x	x
Fault-Tolerance: Evaluation Lead					x
Fault-Tolerance: Replication Management and Fault Detection			x	x

Project Title: Party Blackjack

Baseline Application Description: Online fault-tolerant, real-time, high-performance gaming application where users play Blackjack and store his/her information in a database

Configuration:

middleware - EJB
operating system - Unix
language - Java

Third-party software, if any (databases):

mySQL

Baseline Application Features:

Allows a player to play Blackjack and place bets.
Allows a new player to create his/her profile.
Allows players to store/retrieve their information.

Reliability Requirements:

Preserves on-going game details under a single server failure.
Preserves on-going game details and waits with a timeout under a client failure. (timeout: 3-5 min)
Preserves details of on-going transaction under a single server failure.

Real-Time Requirements:

After each Blackjack game, the user's account information will be updated in database, and it should take no more than 3 seconds to start the next game.
User retrieves existing profile within 1 second of requesting retrieval.
Upgraded version of Party Blackjack should allows multiple number of players per table with each player able to view other players' action within at most 500 ms after a particular action.

Performance Requirements:

Server handles up to 10 tables with 1 player per table (for baseline)
Later, server should handle up to 10 tables with multiple player per table (possibly 6 players per table)
Database can store profiles up to 500 players.

Architecture

HostBean (session): Every request from the client goes through this bean, including saving/retrieving profiles.
FloorBean (entity): This bean represents the floor of our casino. It has a one-to-many CMR with TableBean.
TableBean (entity): This bean represents a Blackjack table. It has a one-to-many CMR with PlayerBean. It keeps track of the deck, and is responsible for passing around the turn.
PlayerBean (entity): This bean represents the profile of a player. It keeps track of cards he's holding, current balance, name, ID, and password.

Baseline Application

Current Status: completed

Downloads

Fault-Tolerant Baseline Application

Each replica, as it starts, notifies the replication manager of its presence and sends its JNDI address. The replication manager keeps track of alive/dead replicas. After being notified of a new replica's presence, the replication manager notifies the fault detector, which communicates with local fault detectors on the replicas. The local fault detector is used to detect process failures, while the (global) fault detector is used to detect machine failures and notify the replication manager of process/machine failures. When a client gets an exception and fails over, it asks the replication manager for a new primary.

* No checkpointing and state transfer is necessary for now since our servers are completely stateless.

* Each client will have a unique ID and an operation ID for each state-changing invocation to avoid duplicate state changes.
* The operation ID will increment with each method invocation that changes state. Both unique ID and operation ID are kept in the database.

Scenarios/Interactions

1. There is one active primary, and one warm-passive backup, named 'server1' and 'server2' respectively.
2. The client starts and looks up available servers in the global JNDI. It finds 'server1' and connects to it.
3. Everytime the client wants to create an object, it sends it to the replication manager, instead of the replicas. The replication manager then creates duplicate beans in both servers.
4. The client wants to create a Host stateless session bean, so it sends the request to the replication manager, which in turn creates two identical session beans in both primary and backup servers.
5. 'server1' dies and the client gets an exception.
6. The fault detector detects that 'server1' is dead and notifies the replication manager.
7. The replication manager modifies the global JNDI accordingly.
8. The client keeps getting an exception until 'server1' is removed from the global JNDI.
9. The client looks into the global JNDI and finds that 'server1' in no longer there but 'server2' is available. It connects to 'server2'.
10. The replication manager attempts to remotely restart 'server1'.
11. Once the fault detector notifies the replication manager that 'server1' has revived, the replication manager adds 'server1' to the end of available servers list in the global JNDI.

Current Status: Completed

Fault-Tolerance Evaluation

FT-eval.doc

749-Team2-Evaluation.doc (Preliminary Result)

749-Team2-Evaluation.pdf (Preliminary Result)

749_probe_data1_12.zip (RAW Probe data for configuration 1~12 (out of 48))

Final Report!

Downloads

Final Raw Data!

Real-Time Fault-Tolerant Baseline Application

Scenarios/Interactions

Current Status

Finished

Downloads

Failover evaluation, graph, piechart, and proposed strategies (.doc) (.pdf)

Final Report (ppt)
CODE/BINARIES (Final Demo al\ so available at /afs/ece/class/ece749/public_html/teams-06/team2/final_demo. See INSTRUCTIONS in the folder for explicit in\ structions on how to run our system.)

Team 2: The House

Team Members:

Team Roles:

Project Title: Party Blackjack

Baseline Application Description: Online fault-tolerant, real-time, high-performance gaming application where users play Blackjack and store his/her information in a database

Configuration:

Third-party software, if any (databases):

Baseline Application Features:

Reliability Requirements:

Real-Time Requirements:

Performance Requirements:

Baseline Application

Current Status: completed

Downloads

Fault-Tolerant Baseline Application

Scenarios/Interactions

Current Status: Completed

Fault-Tolerance Evaluation

Downloads

Real-Time Fault-Tolerant Baseline Application

Scenarios/Interactions

Current Status

Downloads

High-Performance Real-Time Fault-Tolerant Baseline Application

Scenarios/Interactions

Current Status

Downloads