Team 7: Sixers

18-749: Fault-Tolerant Distributed Systems
Spring 2006

 

 

1.  Team 7: Sixers

A.      Team Members:

·         Heejoon Jung, MSIT-SE, heejoonj@andrew.cmu.edu

·         Wangbong Lee, MSE, wangbonl@andrew.cmu.edu

·         Minho Jeung, MSE, mjeung@andrew.cmu.edu

·         Hou Kyu, MSE, kyuh@andrew.cmu.edu

·         Wen Shu Tang, ECE, wtang@andrew.cmu.edu

B.      Team Roles:

Roles

Heejoon

Wangbong

Minho

Kyu

Wen shu

Development

Business Logic

 

 

 

Client

 

 

 

Server

 

 

GUI

 

 

 

DB

 

 

  

 

Analysis

Real-Time

 

 

 

Performance

 

  

 

Reliability

 

 

 

 

Testing

Documentation

Presentation

Configuration

 

 

 

Web Page

 

 

 

C.      Project Title: 

Express Bus Ticket Center

D.     Baseline Application Description:

Online ticketing application for express buses which allow users to search schedules and available seats, reserve tickets and check reservation status.

E.      Configuration:

·         Operating System

·         Server: Linux

·         Client: MS Windows XP

·         Programming Language: Java

·         Database: MySQL

·         Middleware: CORBA
* This configuration may be changed as an adjustments based on future environments.

F.       Third-party software, if any (databases):

·         Eclipse

·         JSP

·         Apache & Tomcat
* This configuration may be changed as an adjustments based on future environments.

G.     Baseline Application Features:

·         Users can search bus schedules, tickets, and available seats.

·         Users can choose available seats.

·         Users can reserve tickets.

·         Users can cancel the tickets that he or she purchased.

·         Users can check the current status of seat occupancy.

H.     Reliability Requirements:

·         The system should be available 24 hours per day, 7 days per week.

·         The system should be recovered within 5 seconds after failure.

·         The ongoing transactions should be preserved under system failure.

·         The system should preserve the user account under system failure.

·         All transaction data should be preserved under system failure.

I.        Real-Time Requirements:

·         Log-in should be processed within 2 seconds.

·         It should take less than 5 seconds to search 1,000 results.

·         It should take less than 4 seconds to book bus tickets.

J.       Performance Requirements:

·         The system shall serve at most 500 users at a time

·         The system shall deal with up to 50 concurrent transactions.

 


 

2.  Baseline Application

A.      Interfaces

              Methods

  Login() : A user logs into the Express Bus Ticket System (Optional)
  CreateProfile() : A new user creates the profile to connect the Express Bus Ticket System. (Optional)
  UpdateProfile() : A user updates the his/her personal profile. (Optional)
  RetrieveSchedule() : A user retrieves the bus schedule in the Express Bus Ticket System.
  BuyTicket() : A user buys a ticket in the Express Bus Ticket System when it is available.
  CancelTicket() : A user cancels the reserved ticket.
  GetMyInfo() : It shows the history of reservation to a user. (Optional)
  IsCreditCardValid() : The system checks if whether the credit card is valid.
  UpdateCreditCardInfo() : The system updates the user’s credit card information. (Optional)
  UpdateSeatInfo() : The system updates the seat status. (Optional)

 Attributes

  Bus information (bus number, from, to, departure time, arrival time, number of available seats)
  Payment method is credit card
  Credit card information(Name, card number)
  Profile information (name, password, user id, credit card number)
  Number of connected users

 Exceptions

  Application Exceptions

  InvalidUserInfo()
  InvalidCreditCard()
  OutOfAvailableSeats()
  AlreadyLoggedOnUser()
  InsufficientUserInfo()


  ▪ System Exceptions

  CreditCardServerNotResponding()
  ScheduleServerNotResponding()
  ServerIsBusy()
  ServerOutOfCapacity()

Scenarios/Interactions

Current Status

Downloads


 

3.  Fault-Tolerant Baseline Application

1.      Replication

A.   How many copies do we need?

-        2 copies (risk, chess)

B.    Replication Type

-        Active Replication;

Advantage: Performance

Disadvantage: More memory and processing cost

C.   Replication Manager

-        No specific replication manager exists.

-        As soon as client application begins, the application acquires the replication information which is stored in Naming Server.

 

2.      Fault Detection

A.      Clients;

-        Server has been killed.

-        Network does not work – TIMEOUT

-        No response from server due to busy server

B.                 Server;

-        No response from Database because network between server application and database doesn’t work.

-        Database query takes too much time.

 

3.     Fail-Over

If a server which communicates with client is failed, then the client gets a notice about server exception. Also the client already knows the replication server information, because when client application begins, the application acquires the replication information which is stored in Naming Server. Therefore the client could communicate with a replication server.

 

4.      Recovery

-        Re-instantiating a failed replication

-        When recovery is successful, the result is written into a log file in Database.

 

5.     Check Pointing

-        Every operation which is done by clients should be stored on database.

-        All fail or exception information is stored on database.

-        When re-initiating a server, state information will be acquired from database.

 

Scenarios/Interactions

Scenario;

1.      The client requests the names of server from the naming server.

2.      The naming server sends the names of servers.

3.      Client will send its state to all servers. For example, if a client buys a bus ticket, then the message will be sent to all servers. Basically, the client will communicate with the primary server.

D.      When the client gets an exception message, then the fault is detected.

E.       The client will communicate with one of the other replication servers.

4.      The primary server will communicate with Database.

5.      The primary server will send the results to clients.

* Recovery manager will check all servers per 10 seconds, if the recovery manager detects a dead server, then the manager will re-initiate the server. At then the manager will get the present state from another replication server.

 

*Detail System Architecture

 

Current Status

Downloads

 


 

4.  Real-Time Fault-Tolerant Baseline Application

A.      Chief Experimenter: Wen Shu Tang

B.      Implementation Changes Required for Fault-Tolerance Evaluation

                       i.              Write timers in client and server, and output to file

                     ii.              Retrieve hostname of client

                   iii.              Retrieve name of method invocation

                   iv.              Change the size of reply

                     v.              Change client program to send request without interaction

                   vi.              Change client program to make 10,000 request

C.      Scripts Required for Fault-Tolerance Evaluation

                       i.              Shell Script that ssh into different machines to create multiple clients which in turn invokes the parameter script

                     ii.              The parameter script goes through the 12 possible configurations through the parameters size of reply msg and inter-request time.

                   iii.              Java file parameterized to configure the system using a particular inter request size and size of message

D.     Design for Fault-Tolerance Evaluation

                       i.              The process of running the entire experimentation is as follows. The master-script would ssh into different machines depending on the number of client needed. The master script would then call the parameter scripts in order to go through all 48 configurations. A java class will be invoked for each configuration to run 10000 requests with the given parameters and output the results for the 7 nodes in the output files. The timers and loggers for the names will be inserted in both client and server programs to output into a logger file. Additionally, we can make 3 types of requests, all of which are both way (request and reply). These requests will be invoked randomly, and their results recorded accordingly.

 

* Retrieve Raw Data using 7 Probes

                     ii.              Issue time (client) à client logging

                   iii.              Receive time

                   iv.              Name of invocation

1.        retrieveSchedule()

2.        buyTicket()

3.        cancelTicket()

                     v.              Time when each request receives

                   vi.              Time when each reply is completed

                 vii.              Name of invocation

               viii.              Host name of client


 

High-Performance Real-Time Fault-Tolerant Baseline Application

This is our approach for the high performance architecture. For improving performance, we will use a thread pool.

-Thread pool: this is a managed collection of threads. The purpose of using a thread pool is improving performance by creating many threads during initializing. When a client sends requests for “retrieve Bus Schedule”, “buy Ticket” or “cancel Ticket” , each request is assigned to created threads. In general, there are two types of a thread pool in a RT-CORBA system. The first one is Thread pool without lanes model and the second is Thread pool with lanes model. To make thread pools, we will make our RT-CORBA system configure a thread pool.


Update (05/04/2006)