Scenarios are generated by taking our initial baseline, identifying where we could and could not test faults, and determining what the correct action of the system should be. Faults are injected by kill –9ing the server on a given machine. This is handled by the replication manager at the moment, through a public method in the fault-injector client that is under development.
Our initial use case was a simple use case where the user enters a name and list of stocks. This data is then stored in the database and can be retrieved by the user. Users now have a more advanced GUI, and their interaction with the system is more involved. The users now log in, and if they are a user, they receive their portfolio and start receiving updates “immediately”. If it is a new user, the user creates a portfolio by selecting stocks. Upon creation, the client then starts receiving updates on the stocks in their profile. At this point both new and existing users are receiving a stream of updated data from the server they are connected to, and can at any point add or remove stocks to or from their profile.
Scenarios:
Fault-injection:
Our designated fault-injection robustness includes server crash faults. This may induce message loss, but message loss is not yet a focus of our system’s robustness requirements. We will be treating the database, replication manager, “external” stock ticker data feed, and JNDI service as “Sacred”. Given that these are working properly, messages will not be assumed to be lost in transit between the servers and the databases. More refined fault injection (with granularity going down to inter-class/inter-bean communication) is expected soon.
At the point at which we require a fault to be tested, either our standalone fault-injector, or the server crash fault enabled replication manager can kill a server. Because the client continuously communicates with the server it is connected to, within one second, the client will “know” that there is a problem on the other end. The client will then pick a random server from the server properties list, and keep trying to connect to servers until communication is resumed. At that point, the client resends whatever request it was trying before. Due to the idempotent nature of our system, repeat requests are not dangerous, and will only serve to confirm the request in the event that it was already submitted. If the client tries to add or remove stocks that are already there or gone, the server will gracefully handle the repeat request.
On the server side, when a server is killed, it re-spawns automatically, raised by the replication manager. The state of the stock-cache is copied over, and a stock-update queue is also transferred, until the newly raised server is functioning consistently with the others. At that point, the “new” server is re-registered with the JNDI service, and the client, if fail-over occurs again, has the chance to reconnect with this “new” server.
In order to avoid cycles of connections to two servers, we have the client randomly select a server to start trying to connect to (ensuring it isn’t the one that just dropped it), then proceed iteratively through the list of servers until connection is resumed.
Summary:
A user can connect, activate or create a profile, and then receives a stream of update data. The user can change what data form the feed it receives. Upon any server failure, the server is restarted by the replication manager, but the client fails over to another server on the list. If a client was in the middle of a request, then that request is sent again. If the request involves anything that was already committed to, then the server handles it gracefully, keeping the process transparent to the user.