Team 5: Virtual Casino

Fault Tolerance Tests

Functionality:
Fault Tolerance
System Elements:
Test distribution:
Setting up the System
Test 1 - Fault Tolerance

In this test we demonstrate the Player Application can recover from a lost of contact with the primary server and re-connect with the new primary. This facility is almost totally handled by the Replication Manager. The client just needs to try to re-bind to the same name that it had bound to before as the Replication Manager swaps in a reference to a new server automatically. The Replication Manager also alerts the new primary that it is the new primary so that is can retrieve its state from the database. Note that the Player Application, the database server, and the ORB are considered "sacred".

  1. At the Player Application Prompt " Q--Please enter your Player name: ", enter your name.
  2. At the next prompt, do nothing yet.
  3. From another shell, ssh to the host running the primary server. Type "kill -9 PID", where the PID is the process number of the primary server. The host and PID for the primary server is displayed in the output from the replication manager as it starts the servers.
  4. The replication manager should indicate that it has lost contact with the primary and that it assigning the old backup to be the primary and that it is bringing up a new server to replace the lost one.
  5. Now type the number of chips that you want to buy into the Player Application prompt. The Player Application should behave as if nothing adverse happened. it found a new server.
Test 2 - Recovery

In this test we demonstrate that we can kill the primary server, recover and kill the primary again and recover again. This is all handled by the replication manager automatically.

  1. At the Player Application Prompt " Q--Please enter your Player name: ", enter your name.
  2. Continue following the prompts until you are inside a game.
  3. From another shell, ssh to the host running the primary server. Type "kill -9 PID", where the PID is the process number of the primary server. The host and PID for the primary server is displayed in the output from the replication manager as it starts the servers.
  4. The replication manager should indicate that it has lost contact with the primary and that it assigning the old backup to be the primary and that it is bringing up a new server to replace the lost one.
  5. The Player Application should behave as if nothing adverse happened, although it might re-prompt you with the last question it had asked before the crash.
  6. Repeat step 3 killing the new primary.
  7. Again play should continue without interuption but maybe one repeated query.
Test 3 - Exception Handling

In this test we demonstrate the ability of our code to handle exceptions.

Handled Exceptions
  • If the orb is not running in the right place and you try to start the Replication Manager, it will shutdown gracefully.
  • If the orb is not running and the Player tries to register with the server, the Player application will handle it.
  • If a server is not running and the Player tries to register with it, the Player application will handle it.
  • The prompt for buying quantities of chips will reject out of range or non-sensical responses.
  • The prompt for betting will not allow you to bet more than you have, less then 10 or larger thant 25000 in any case.
Unhandled Exceptions
  • If a client is killed the server gets in a race condition.
  • If the database is in a corrupt state from a previous test, the server cannot handle it.
Limitations
Last modified: Sat Mar 15 16:17:31 EST 2003